Compiling cuda code onTVM runtime will degrade performance?

I am new to TVM and learning TVM source code. I’m interesting in GPU kernel generation and kernel loading. GPU kernel will generate and be compiled to ptx when tvm.build() is executing. Then TVM calls cuda API cuModuleLoadData () cuModuleGetFunction() and cuLaunchKernel() to load and launch the generated kernel on runtime. I think there is compiling operation on runtime, ptx will be turn into the executable code. Compiling maybe execute many times when multi-process launch the same generated kernel. I am confused about whether compiling at runtime will degrade performance?

usually runtime jit of ptx will only happen once and affect the perf when the function is called in the first time, this also applies to other code generated by nvcc(if it is ptx). Then we won’t pay the cost as long as the process uses the same function handle

@tqchen thanks for reply。

@tqchen Hi,I have another question and look forward to your answer sincerely, cuda API cuLaunchKernel() need parameter of the shape of thread grid and thread block , so when does TVM decide how many GPU threads the generated GPU kernel should use ? In the period of kernel building or launching? If possible,could you give the source file name in TVM project about this part? Waiting for your reply sincerely, please forgive me for any inconvenience. Best wish!

1 Like