What does the kernel in tvm compiled to cuda mean?

x-huan · April 26, 2022, 12:37pm

Hi everyone: Recently I am profiling the execution compiled by tvm. It’ s a pytorch model, the target is cuda. when I profiled the python script by Nvidia Nsight system

There are a lot of kernels like tvmgen_default_*. I wonder how these kernel were created, and for the kernel tvmgen_default_fused_nn_einsum_2_kernel0, does it do some computation

x-huan · April 26, 2022, 12:57pm

The code

kparzysz · April 26, 2022, 9:55pm

When TVM reads a model, it represents it in relay. There, initially each operator looks like a separate function, and invocation of an operator looks like a function call. Then, a fusion pass will partition the graph, where each partition is a group of such calls. Each such group will become a single function later on. This is where names like tvmgen_default_fused_add_multiply_erf_multiply_add_multiply come from. Then, once the relay has been translated into TIR, some loop nests in it may be marked as targeted for an accelerator, like an NVIDIA GPU. Each such loop nest will be extracted into its own function, with _kernelN appended to the name. These are typically pure computations, without any function calls, or control flow.

In your code, you can try doing something like print(lib_einsum.ir_mod) to see the TIR module, but it may have the kernels already extracted.

x-huan · May 3, 2022, 9:33am

Hi，thanks for your reply. I am sorry that I haven’t reply in time. Now I get the cuda source code about my network. and it look like this

is it that the exactly code what cuda run?

kparzysz · May 3, 2022, 3:20pm

Yes, that’s pretty much it.