Can TVM capture the communication cost between Big core and Little core of ARM Big Little CPU?

popojames · November 19, 2021, 5:35am

Hello TVM developer and community,

I have been working on running inference with TVM on CPU only. Especially, I am working on ARM big Little CPU core.

I am wondering about ARM big Little CPU core, is it possible to for TVM capture the communication cost between Big core and Little core of ARM Big Little CPU?

I knew in the GPU-CPU cooperation case, we can use device_copy to model the communication cost between CPU and GPU and get the communication cost between CPU (llvm) and GPU (open CL)

Since Both big and little CPU clusters are regarded as “a llvm device”, I cannot capture communication costs as we did in GPU-CPU. Is there any way to obtain such information?

Any thoughts are welcomed.

tkonolige · November 20, 2021, 12:48am

@popojames We do not have a way to measure cost of moving data between big and little cpus. It unclear how we could measure this as the big and little cpus share the same memory. From TVM’s point of view, we only consider memory copies when we are copying from distinctly accessible memory (i.e. cpu to gpu).

You maybe could try using the PAPIMetricCollector to measure performance counters related to L3 (or L2) memory transfers between big and little cpus. Assuming such performance counters exist.

areusch · November 20, 2021, 4:59pm

@popojames right now we only optimize at the “operator” level (post-operator fusion). it’s possible as we begin expanding optimization towards the subgraph level, we’ll need to incorporate some way of accounting for memory copy time. however, as @tkonolige mentioned, this is somewhat difficult to capture.