Currently, when loading model, we use
loaded_json = open(os.path.join(model_dir,
prefix + "graph.json")).read()
path_lib = os.path.join(model_dir, prefix + "lib.tar")
loaded_lib = tvm.module.load(path_lib)
loaded_params = bytearray(open(os.path.join(model_dir, prefix + "param.params"),
"rb").read())
The model is resnet18-v1 from mxnet gluoncv. I successfully run it under quantization in cuda target before saving.
However. for quantized model, it would generate a bunch of error message as follows when in inference after loading:
BB36_9:
cvt.s64.s32 %rd13, %r249;
add.s64 %rd4, %rd1, %rd13;
@%p5 bra BB36_11;
ld.global.nc.u8 %rs7, [%rd4];
st.shared.u8 [%r5], %rs7;
BB36_11:
@%p6 bra BB36_13;
ld.global.nc.u8 %rs8, [%rd4+1];
st.shared.u8 [%r5+1], %rs8;
BB36_13:
@%p7 bra BB36_15;
ld.global.nc.u8 %rs9, [%rd4+2];
st.shared.u8 [%r5+2], %rs9;
BB36_15:
@%p8 bra BB36_17;
ld.global.nc.u8 %rs10, [%rd4+3];
st.shared.u8 [%r5+3], %rs10;
BB36_17:
@%p9 bra BB36_19;
ld.global.nc.u8 %rs11, [%rd4+4];
st.shared.u8 [%r5+4], %rs11;
BB36_19:
@%p10 bra BB36_21;
ld.global.nc.u8 %rs12, [%rd4+5];
st.shared.u8 [%r5+5], %rs12;
BB36_21:
bar.sync 0;
ld.shared.u8 %r101, [%r7];
ld.shared.u8 %r102, [%r7+1];
prmt.b32 %r103, %r102, %r101, 30212;
ld.shared.u8 %r104, [%r7+2];
ld.shared.u8 %r105, [%r7+3];
with no ends.
Is there a right way to load quantized models? Thanks a lot!