autotvm.RPCRunner and TVM_NUM_THREADS

cbalint13 · July 30, 2019, 11:42pm

On remote:

I think on remote edge (RPC) is nonsense to set TVM_NUM_THREADS (by my logic) it doesn’t help. Also, i can’t see anywhere in the RPC code. It receives one sample kernel test it (using multicore CPU or GPU) then send metering results back. Can’t see what can be parallel on RPC side (either in the code). The kernel under test itself may be run parallelized, but only one kernel (test case) will run at once on edge.
If one want parallel searching on remote RPC then have to use multiple physical edges, each registered to the tracker will receive at same time test kernels, thus N edges yields (Time / N) shortage.

On host:

Yes it matters a lot. It can be observed during xgboost steps (internal xgb feature re-processing):

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                        
19199 cbalint   20   0   24.2g 700932 112244 R 181.8   4.3   0:30.97 tune-mali.py                                                                   
19198 cbalint   20   0   24.2g 700808 112120 R 154.5   4.3   0:28.81 tune-mali.py                                                                   
19197 cbalint   20   0   24.2g 700932 112244 R 136.4   4.3   0:31.74 tune-mali.py                                                                   
19193 cbalint   20   0   24.2g 700796 112108 R 127.3   4.3   0:29.89 tune-mali.py                                                                   
19195 cbalint   20   0   24.2g 700916 112228 S 127.3   4.3   0:30.40 tune-mali.py                                                                   
19194 cbalint   20   0   24.2g 700872 112184 S  18.2   4.3   0:29.98 tune-mali.py                                                                   
19196 cbalint   20   0   24.2g 700924 112236 S   9.1   4.3   0:33.25 tune-mali.py

I think tutorial sets it to 1 (safe demo for any target).