Should I worry about measure error when auto-tuning multiple tasks simultaneously?

Hi, I am new to tvm and I’m working with ansor. I’ve followed some tutorials and they worked well. Now I wonder If the measurements will remain correct if I launch multiple auto-tuning tasks simultaneously? I have referred to How does AutoTVM distribute resources for tuning and runtime measurements on one single CPU?. It seems like it’s OK when CPU/GPU resources are relatively abundant. But what if I run 1,000 tasks on a CPU with only 32 threads? Or what if the total memory requirement of these task are somehow too large, lead to OS swap issue? Any help would be appreciated! :slight_smile:

if you tune GPU tasks this is likely fine since it measures GPU time. if you tune CPU tasks, running tuner and worker on the same machine can be inaccurate

2 Likes

Hi, not sure what the ‘multiple tasks’ you were refer to specifically, if the ‘tasks’ is what Ansor extrated from IrModule, which has nothing to do with hardware resource. For Ansor there are intermittent measuring periods when single task schedule is built and measured on target hardware. The measuring is done task by task sequentially not parallel.

1 Like

Thanks for replying! Sorry for the confusion. :frowning_face:

The ‘task’ I am refering to is not the subgraphs within a model.

It simply means different Ansor programs. For example, executing 10 scripts, each utilizing Ansor to tune its own operators. A GPU is shared for their measurements. :grinning:

Concurrent launching parallel cuda-graph or cuda kernel can definitely cause latency become larger, so results in inaccurate measuring. From my observation, Ansor tuning process is almost cpu bond (please correct me if I’m wrong), the gpu measuring span is only small portion, the most of time is spend on sampling valid schedule from sketch, and this is done by python multiprocessing, by default setting it will utilise all cpu cores, so I doubt is there any benefit to run multiple Ansor process concurrently.

1 Like