I used the relay quantization tool in the following way:
with relay.quantize.qconfig(nbit_input=16,
nbit_weight=16,
nbit_activation=16,
dtype_input="int16",
dtype_weight="int16",
dtype_activation="int16",
global_scale=8.0,
skip_conv_layers=None,
round_for_shift=True,
store_lowbit_output=False,
debug_enabled_ops=None,
):
print(" ", relay.quantize.current_qconfig())
mod["main"] = relay.quantize.quantize(mod["main"], params)
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(mod, target=target, params=params)
from tvm.contrib import graph_runtime
dtype = 'float32'
m = graph_runtime.create(graph, lib, ctx)
# set inputs
for i in range(20):
m.set_input('x', x[i:i+1])
m.set_input(**params)
# execute
m.run()
# get outputs
tvm_output = m.get_output(0, tvm.nd.empty(((1, 3)), 'float32'))
print(tvm_output)
from print(tvm_output)
function above I got the same output for every different image which are as follows:
[[ 3.4990034 18.61744 6.8425226]]
[[ 3.4990034 18.61744 6.8425226]]
[[ 3.4990034 18.61744 6.8425226]]
[[ 3.4990034 18.61744 6.8425226]]....
But if I remove the quantization function then I got the correct outputs with different values for different images as below:
[[2.6558046e+01 2.5423976e-02 1.7294771e+01]]
[[2.6697420e+01 1.6384061e-02 1.9536079e+01]]
[[2.6694643e+01 1.6997138e-02 1.8962639e+01]]
[[2.6695280e+01 1.6843578e-02 1.8819904e+01]]...
Can anybody please comment on what could be the issue?
relay.quantize.qconfig
should at least produce different values for different inputs.