Hi, I am trying to import a scripted quant-MobileBERT torch model via the relay.frontend.from_pytorch
API, which works fine for quantized vision models like resnet50. However, I met the following problem (maybe due to the complexity of BERT model):
When it comes to
4171: input_scales_for_bias = qnn_torch.add_input_quant_params_to_op_inputs(graph)
inside the qnn_torch.add_input_quant_params_to_op_inputs()
function,
for
490: if "quantized::conv" in operator or "quantized::linear" in operator:
# This is required for quantizing the bias
assert len(input_scales) == 1, "One quantized parameter expected for qconv or qlinear."
input_scales_for_bias[node.inputsAt(1).debugName()] = input_scales[0].node().f("value")
it expects input_scales[0].node()
to be something like:
%493 : float = prim::Constant[value=0.01865844801068306]()
(The case in quantized resnet50)
Where it can directly pull out the value of the scale since it is a constant.
However in the MobileBERT case input_scales[0].node()
is actually something like:
%cat_output_scale_0.1 : Tensor = prim::GetAttr[name="cat_output_scale_0.1"](%self.1)
It is not a constant straight out of the box and needs to call the function to get the scale attribute from %self.1( AKA the script module). Thus a error will be thrown out since there’s actually no “value” key attached to input_scales[0].node().f
here.
Actually, I tried to hack this by passing in the script_module and get the scale constant by hand via something like float(getattr(script_module,func_name))
and it does eliminates the errors here temporarily. However this is not a complete solution and I still get more errors from the following steps since some of the nodes cannot be recognized/parsed correctly.
Any suggestions for this problem? Thanks : )