Limit CPU cores for Auto tuned Model

sol401430 · April 26, 2019, 6:03pm

The current auto tuning result is very good on our Android device with only 1 problem, as TVM has used all CPU on the device to do model inference and our application need to run continuously, CPU become too hot in the long run and performance decrease in that case.

So I want to ask whether I can limit TVM to only use 1-2 CPU cores on Android device? and should I do this during auto-tuning and re-tune(so that the tuned model will only use 1-2 thread)? or can I do this using the original tuned model, but change some setting on android application before launch the application would be OK?

one solution I’ve seen is to set "runtime.config_threadpool", this seem suggest re-tune

The other is to set TVM_NUM_THREADS, which seems re-tune is not required?

eqy · April 26, 2019, 6:23pm

In general you should be able to change the number of CPU threads without retuning the schedule, they should be relatively robust to those kinds of changes. The main difficulty comes from when affinity changes from big cores to LITTLE cores since they may be significant microarchitectural differences. Try it and let us know if the scaling is not close to linear.

However, I think you will find OS CPU frequency governors to be relatively aggressive by default; if you choose to use two CPUs continuously (instead of say, four), then it is likely that the two cores will get just as hot and throttle similarly as the OS will try to ramp up frequency on the active CPUs until thermal limits. This is more of an issue with how smartphones are designed to be used (e.g., bursty applications without much sustained load).

sol401430 · April 26, 2019, 11:26pm

Thanks!

I’ve tested the performance by limiting threads_number to 2, the latency increase from 13ms to 16ms which is totally acceptable to us. CPU usage decrease from %300 to about %120;

One another question: we want to try different option to resolve the “CPU too hot” problem; and run TVM on LITTLE cores with low energy cost is also one option we want to try; by reading the code in threading_backend.cc, it seems we can also configure TVM runtime to prefer LITTLE cores instead of big cores? how could we configure that(e.g. does calling the method before load TVM module works? do we need to re-tune to achieve that?)

eqy · April 27, 2019, 12:18am

You can use LITTLE cores instead by using config_threadpool. (e.g., set the mode to kLittle)

You will likely have to retune to achieve maximum performance, and things may be a little complicated because you may need to remove any pretuned TVM configs for your hardware backend if they exist (they may be prioritized for big cores). Using the existing schedules, you should be able to at least run the code. However, I am not certain that even using exclusively LITTLE cores will avoid thermal bottlenecks.