[VTA]Dense calculation tutorial: about Graph-packing and quantization

Hi, I’m currently following this sample code(c8064b), and I couldn’t get some points. Here comes my question:

  1. Why does this code limit the maximum and minimum value with “my_clip” function?

  2. When you designate the data shape(code line 71~77), is there any rule that should be strictly followed? I mean, is there any special reason that their order should be like (batch_size // env.BATCH, in_feat // env.BLOCK_IN, env.BATCH, env.BLOCK_IN)? If there’s document, let me know.

  3. I don’t get the role of code line 91 and 92. Why do they conduct right_shift and clipping?

  4. At code line 105 and 106, why do they limit the range of value like (1 << (env.INP_WIDTH - 1))? May be related to question1.

  5. Does this code guarantee the best performance? If so, does this code still guarantee the best performance even if I change the (batch, input_size, output_size)?

Thank you

Solved question 1~4 through another tutorial:

But I still can’t know the answer of question #5.

@woojinnn Hi, I was wondering if you implemented nn.upsample in the graph_pack process. At present, I am trying to implement Unet through VTA, but I met some problems in the graph_pack process. I suspect the reason is nn.upsample or torch.cat. A full description can be found in another post I wrote in Can Upsample be implemented on VTA in graph_pack?. I have been confused here for a long time. I want to know if you have encountered this problem or can you offer some suggestions? Thank you.