CodeGen of Nimble

fengyuICT · March 23, 2021, 4:37am

I read the paper “NIMBLE: EFFICIENTLY COMPILING DYNAMIC NEURAL NETWORKS FOR MODEL INFERENCE”, and I am confused about section 3.5. Specifically, the technique “the residues modulo of the tiling factor” is not understood. Can you give an example?

Take matrix multiplication C=A*B as an example, where A=[any, K], B=[K, N], tile_factor is 8 (is the tile factor fixed?). Then respectively enumerate any=[64, 65, …,71], and finally generate eight kernels (the technical details of tune kernel can be omitted，because the explanation in the paper is clearer)?

Is my understanding correct?

@haichen @jroesch

haichen · March 23, 2021, 10:12pm

That’s not correct. First, the tiling factor is fixed and it was chosen by AutoTVM tuning on static shapes. Assuming the tile factor is 8, we replace the Any by 8k, 8k+1, …, 8k+7 where k is a symbolic var, and generate one kernel for each. In addition, we generate a dispatch kernel to launch the correct one at runtime.

fengyuICT · March 24, 2021, 8:36am

@haichen Thanks for replying! I understand the relationship between tiling factor and kernel.

I am now confused about how to determine the tiling factor. You said that the tiling factor is tuned by AutoTVM according to the static shape. Can it be understood as using a typical shape as the input of AutoTVM to get the tiling factor?

wxyhv · March 26, 2021, 6:14am

hello~ when will the Nimble add into the TVM to support dynamic input shape?

haichen · March 29, 2021, 6:24am

We first replace the symbolic dimension with a large constant (e.g., 64, 128) and use the standard AutoTVM tuning to search for the schedules. We observe that the tuning on large sizes usually covers good schedules on other shapes. After the tuning is done, we then choose the top 100 schedules and evaluate them on other sizes (e.g., 1, 2, 4, 8, …). We pick the schedule that achieves the best average performance as the final schedule. That’s how the tiling factor is determined.

haichen · March 29, 2021, 6:25am

Most of the part related to VM is already pushed into the TVM. We are working on a more systematic way for tuning with symbolic shapes.

misto · November 16, 2021, 12:44pm

After reading the nimble paper, I would like to ask about 3.5 Symbolic Codegen:

How to combine autotvm and nimble? Does the model template and search space need to be reduced by human control? How to find the top 100 schedulers?
Can ansor and nimble be combined? If so, how can it be realized? Looking forward for your response, thank you!

Qiu1981 · July 5, 2023, 8:23am

Hi Haichen, I try to find the dispatch related code, but fail. Are Kernel split and dispatch related codes pushed into TVM Main? Thanks a lot