A question about offloading graph to VTA

Hi, I’m following the tutorial Deploy Pretrained Vision Detection Model from Darknet on VTA(https://tvm.apache.org/docs/topic/vta/tutorials/frontend/deploy_detection.html#deploy-pretrained-vision-detection-model-from-darknet-on-vta). I try to print the IR after quantization and I get the following result:(I only list part of the output here)

cast(%9, dtype=“int8”) /* ty=Tensor[(1, 1, 208, 208, 1, 16), int8] /; %12 = transpose(%10, axes=[0, 2, 4, 5, 1, 3]) / ty=Tensor[(2, 1, 3, 3, 16, 16), int8] /; %13 = reshape(meta[relay.Constant][3] / ty=Tensor[(32, 1, 1), int32] /, newshape=[2, 16, 1, 1, 1]) / ty=Tensor[(2, 16, 1, 1, 1), int32] /; %14 = transpose(%13, axes=[0, 2, 3, 4, 1]) / ty=Tensor[(2, 1, 1, 1, 16), int32] /; %15 = nn.conv2d(%11, %12, padding=[1, 1, 1, 1], channels=32, kernel_size=[3, 3], data_layout=“NCHW1n16c”, kernel_layout=“OIHW16o16i”, out_dtype=“int32”) / ty=Tensor[(1, 2, 208, 208, 1, 16), int32] /; %16 = broadcast_to(%14, shape=[2, 1, 1, 1, 16]) / ty=Tensor[(2, 1, 1, 1, 16), int32] /; %17 = add(%15, %16) / ty=Tensor[(1, 2, 208, 208, 1, 16), int32] /; %18 = add(%17, 64 / ty=int32 /) / ty=Tensor[(1, 2, 208, 208, 1, 16), int32] /; %19 = right_shift(%18, 7 / ty=int32 /) / ty=Tensor[(1, 2, 208, 208, 1, 16), int32] /; %20 = clip(%19, a_min=-127f, a_max=127f) / ty=Tensor[(1, 2, 208, 208, 1, 16), int32] /; %21 = cast(%20, dtype=“int8”) / ty=Tensor[(1, 2, 208, 208, 1, 16), int8] /; %22 = annotation.stop_fusion(%21) / ty=Tensor[(1, 2, 208, 208, 1, 16), int8] /; %23 = cast(%22, dtype=“float32”) / ty=Tensor[(1, 2, 208, 208, 1, 16), float32] /; %24 = multiply(%23, 0.179688f / ty=float32 /) / ty=Tensor[(1, 2, 208, 208, 1, 16), float32] /; %25 = nn.leaky_relu(%24, alpha=0.1f) / ty=Tensor[(1, 2, 208, 208, 1, 16), float32] /; %26 = nn.max_pool2d(%25, pool_size=[2, 2], strides=[2, 2], padding=[0, 0, 0, 0]) / ty=Tensor[(1, 2, 104, 104, 1, 16), float32] /; %27 = multiply(%26, 5.56522f / ty=float32 /) / ty=Tensor[(1, 2, 104, 104, 1, 16), float32] /; %28 = round(%27) / ty=Tensor[(1, 2, 104, 104, 1, 16), float32] /; %29 = clip(%28, a_min=-127f, a_max=127f) / ty=Tensor[(1, 2, 104, 104, 1, 16), float32] /; %30 = reshape(meta[relay.Constant][4] / ty=Tensor[(64, 32, 3, 3), int8] /, newshape=[4, 16, 2, 16, 3, 3]) / ty=Tensor[(4, 16, 2, 16, 3, 3), int8] */;

I do not change the start_pack and stop_pack and I think TVM should pack all the subgraph from start_pack to stop_pack and send it to VTA. However, as is shown above, from %23 to %29, the calculation uses float32 type while VTA can only execute int8/int32 one. Does it mean that the calculation from %23 to #29 is executed on CPU instead of VTA?