TVM & NUMA Nodes

Hello,

I’ve got a question regarding TVM and its behaviour on multi-socket systems or on high core count processors with multiple NUMA nodes (i.e. the 64 core AMD Eypc or dual- to quad socket servers):

Does it scale with TVM? I mean, can a single execution of Conv2D or Dense layers be mapped across all cores, even when using 2 or 4 CPUs?

Hi, I guess you can take a look at these two Env. variables:

import os
os.environ["TVM_NUM_THREADS"] = '4'
os.environ["TVM_BIND_THREADS"] = '0'

to my best knowledge, “TVM_NUM_THREADS” can control the thread number you wanna used, if set the “TVM_BIND_THREADS” to 0 , it will automatically use 0-3 core of socket 1, if you wanna set up the CPU affinity to specific on like 3-6 or some others, pls try to set “TVM_BIND_THREADS” to 1 and use numactl or taskset to bind them.

but if you setup the “TVM_BIND_THREADS” to ‘1’ and forgot to bind thread, the workload would randomly select 4 cores and floating between 2 sockets.

More details could be find in sourcecode:

tvm/src/runtime/threading_backend.cc

BTW I’m afraid we cannot take such finegrained control of CPU affnity to single kernel yet.

1 Like