Thanks a lot!
I am very glad that you can share a script to help me start,
You can sent a message through the website MESSAGE.
I used NHWC schedule for tuning for mobilenet. Following is the result
Network | TVM NCHWc (ms) | TFLite NHWC (ms) |
---|---|---|
mobilenet-v1 | 72.46 | 210.00 |
It seems current NHWC schedule requires deeper investigation. We lack a NHWC depthwise schedule. I also saw some warnings/errors while compiling/autotuning. Listing them here - these can tell the next steps. @FrozenGene can you take a look at improving NHWC schedule?
Compilation
- AlterOpLayout issue - https://github.com/apache/incubator-tvm/pull/5350 Might be possible to hide kernel layout change.
Auto-tuning
- Detect vectorize inside vectorized loop, ignoring…
- Large unroll factor -
result: MeasureResult(costs=(InstantiationError(['Too large factor for unrolling', 'Too large factor for unrolling'],),), error_no=1
- Timeout error -
result: MeasureResult(costs=(TimeoutError(),), error_no=6
I also created a quick tutorial here - https://github.com/apache/incubator-tvm/pull/5354
This is a tutorial on tuning a TFLite model for ARM CPUs. This tutorial is largely based on previous two tutorials.
- Compile TFLite Models - https://docs.tvm.ai/tutorials/frontend/from_tflite.html#sphx-glr-tutorials-frontend-from-tflite-py
- Auto-tuning a convolutional network for ARM CPUs - https://docs.tvm.ai/tutorials/autotvm/tune_relay_arm.html#sphx-glr-tutorials-autotvm-tune-relay-arm-py
Actually, it is mostly a copy paste of 2 tutorials. The interesting change is only this - https://github.com/apache/incubator-tvm/pull/5354/files#r409919577
I am not sure if we need a new tutorial that is 90% same to previous tutorials? @tqchen Do you have any comments?
@kindlehe you can use the script in the tutorial to get started.
sure.
This should be NHWC schedule problem.
It is normal. Because we have max_unroll
to restrict it.
It it normal. When we have unroll, sometimes we will meet build time out. If the schedule is not good, we will meet runtime out error. I think it is accetable.
Maybe we could just add one TFLite network on Auto-tuning a convolutional network for ARM CPUs and add one section note in Compile TFLite Models to instruct users know how to get better performance leveraging AutoTVM.