TVM auto tune questions and specific hardware deployment problem

Hi, everybody, I have tow questions, please help me out.

  1. I read the tutorials about the tvm auto-tune, we need to set the output shape, but when the network has multiple output, what the output shape should I set ? For example, for detection net, it contains classification branch and bbox regression branch with different outpu shape, so how to set the output shape to get better auto tuning effect?
  2. When I compile and tune the model in 1080TI, can the model deployed in 2080TI? Do I have to repeat the auto tuning process in 2080Ti?

Thanks.
Best, Edward

Hi Edward,

  1. You don’t need to set shape for the output. Only input shapes are required.
  2. For the best of performance, you should tune it again on 2080Ti. But given that the architecture of 1080Ti and 2080Ti is quite similar, I guess you can also get good performance using the model tuned on 1080Ti.

Hi haichen,
Thank you very much,
Another question, I want to implement tvm based operation like np.dot(A, B) my target is 1080 Ti, how can I auto tune the implementation to get better speed? Any suggestions or demos? What should I use, topi or something else? Thank you very much.

If the operator you want is already implemnted in TOPI, you can directly tune them for your device like this tutorial. Simply replace the “define network” part to the target workload.

For customized operators, you can write a tunable schedule as an AutoTVM template like this tutorial.