The resnet18 example on the VTA has the json input file as custom so that the NNVM graph gets converted into VTA. I saw a post which states that it was created in a discontinued internal branch. Also a new graph transformation is being developed for Relay and NNVM will be discontinued. What is the timeline for relay+VTA based solution which can be used to port other models in VTA? Also can you please help understand the transformation done to resnet18 json file and is it possible to perform similar change to other models until the relay based solution is available.
We are working to release Model translation in Relay that will massage off the shelf models to be compiled and run on VTA. This involves applying quantization from fp32 to int8, and subsequently performing bit-packing so that we can take advantage of tensorization.
I’m working with @jroesch and @MarisaKirisame getting these features released in Relay ASAP (i.e. within the next couple weeks), and NNVM support for VTA will be deprecated.
The resnet18 model was a series of custom quantization passes applied in an ad-hoc fashion; @ziheng can comment on how this was achieved. However it is not seen as a sustainable approach to quantization. We want to bank on a push-button compilation flow in Relay moving forward.
I’ve found the current master branch has this feature with AutoTVM supports on VTA.
$TVM_ROOT/vta/tutorial/frontend/deploy_resnet_on_vta.py describes how TVM massages MXNet’s gluon model.
However, I also noticed that only some of models are possible to be applied.
Is there any specific reason about this?
I’ve tested the tutorial codes for resnet18-vta example with different models in model zoo. (e.g. Resnet50_v1, DenseNet, etc.)
Others are not changed. But, only resnet34_v1 was successful other than the original model.
... File "/home/arc-yhlinux/workspace/tvm-uptodate/src/relay/pass/quantize.cc", line 344 TVMError: Check failed: lhs->dtype == dtype (int8 vs. int32) :
I know some of layers are not off-loaded to VTA and this is manually specified on the file.
And, I thought, at least, Resnet kinds (e.g. Resnet-50) which has similar structures and start/end points to Resnet-18 should work as Resnet-18 did.
If you clarify this issue, I would be very thankful.
This is because the quantization pass we implemented in Relay has limited operator support. By extending quantization to support more models we’ll get better coverage of models for VTA. This is WIP.