Kernels running on the GPU require all memory accesses to be within a thread or a block. The file you are looking does not do any thread binding. I suggest looking at this tutorial: https://tvm.apache.org/docs/tutorials/optimize/opt_conv_cuda.html
1 Like