What I am looking for?
I am searching for a tool which enables me to profile my TVM-generated code on assembler level. This tool should be able to count how often a specific opcode – lets say a popcount or vector-multiply-accumulate operation – is called during execution.
In the optimal case it is further possible to use this profiling tool over the RPC interface. so I can run my executables in normal mode or in profiling mode, getting a list of all opcodes and how often they occur.
What I’ve tried so far
I used a simple matrix-multiply example and created a test library with TVM and deployed it with the C++ interface, as mentioned in how_to_deploy, as packed and non-packed version on a Raspberry Pi. The both executables seem to work properly. Then I tried to use dynamoRIO as instrumentation tool with the opcode_count
function to explicitly count how often one specific opcode is executed. However, I ran into errors because some parts of dynamoRIO are not fully ported to arm architecture (github issue). At the moment it don’t looks like someone is working on this issue.
My questions
-
Is someone of you aware of a similar profiling/instrumentation tool for arm?
-
Or do you know about another way to get the desired information? Maybe within TVM or somehow with LLVM?
-
If I find such a tool and it is working, how should I start to get it to work with TVM’s RPC? Can I call external programs with the RPC interface?
In any case, thank you!