I have a question regarding the parallelism in the TVM runtime. I was wondering if the number of threads defined by TVM_NUM_THREADS is used for intra parallelism (i.e., parallelizing a given convolution) or is used for inter parallelism (i.e., running multiple independent convolutions in parallel)
If the answer is that TVM only supports intra parallelism, what are the actual operators that are parallelized? All or just convolutions?
One final question: what is the role of TVM_BIND_THREADS?
TVM does thread level parallelization inside each kernel. All kernels are executed in parallel. For CPU, this is done by define s[x].parallel when tensorizing. TVM_BIND_THREADS is to indicate whether we set cpu affinity.
@kevinthesun if I got your answer right, TVM only supports intra parallelism according to the Tensorflow terminology, i.e, for example a convolution is parallelized internally, but two independent convolutions are NOT executed next to each other in parallel (inter parallelism), is this right?
What do you mean by all kernels are executed in parallel? To be cristal clear could you please clarify about what kernels are we talking about in terms of operators?