I read the paper “NIMBLE: EFFICIENTLY COMPILING DYNAMIC NEURAL NETWORKS FOR MODEL INFERENCE”, and I am confused about section 3.5.
Specifically, the technique “the residues modulo of the tiling factor” is not understood. Can you give an example?
Take matrix multiplication C=A*B as an example, where A=[any, K], B=[K, N], tile_factor is 8 (is the tile factor fixed?). Then respectively enumerate any=[64, 65, …,71], and finally generate eight kernels (the technical details of tune kernel can be omitted,because the explanation in the paper is clearer)?
That’s not correct. First, the tiling factor is fixed and it was chosen by AutoTVM tuning on static shapes. Assuming the tile factor is 8, we replace the Any by 8k, 8k+1, …, 8k+7 where k is a symbolic var, and generate one kernel for each. In addition, we generate a dispatch kernel to launch the correct one at runtime.
@haichen Thanks for replying! I understand the relationship between tiling factor and kernel.
I am now confused about how to determine the tiling factor. You said that the tiling factor is tuned by AutoTVM according to the static shape. Can it be understood as using a typical shape as the input of AutoTVM to get the tiling factor?
We first replace the symbolic dimension with a large constant (e.g., 64, 128) and use the standard AutoTVM tuning to search for the schedules. We observe that the tuning on large sizes usually covers good schedules on other shapes. After the tuning is done, we then choose the top 100 schedules and evaluate them on other sizes (e.g., 1, 2, 4, 8, …). We pick the schedule that achieves the best average performance as the final schedule. That’s how the tiling factor is determined.