How many threads are used in ARM CPU evaluation after autotuning?

Trying to follow this tutorial. I’m curious how many threads are used during tuning/evaluation?

According to the benchmark page,

Note: If a board has big.LITTLE architecture, we will use all big cores. Otherwise, we will use all cores.

Can someone confirm this is generally the case? How do I change thread count?

Hi there,

TVM will use the maximum number of threads that it can (big cores), but you can change the maximum by setting this environment variable: TVM_NUM_THREADS=[num_threads] (on the board being tuned on).

E.g., you might want a max of two threads, so you would run your RPC server with:

export TVM_NUM_THREADS=2 && python3 -m tvm.exec.rpc_server --host 0.0.0.0 --port=2083 --key=my_key

Personally, I’m still trying to figure out how to use all big cores and all LITTLE cores together.

@Wheest much appreciated!

What’s the equivalent of setting TVM_NUM_THREADS env variable on Android? Do you by any chance know?

I’m afraid I haven’t played with the Android interface, so don’t really know the workflows. But as far I know you can still set the environment variables using Android?

A cursory look suggests there is support for this in Android.

I think you can also use the config_threadpool system, which I think you would access in Python with something like:

config_func = remote.get_function('runtime.config_threadpool')

config_func(1, 1) # use 1 big core
config_func(1, 2) # use 2 big cores

Thanks! Will give it a try.

Hi there,

Did you figure out the way to control threads count during the tuning process?

Thanks, Fan