How to include headers in generated CUDA source

hi, i’m trying to call cuda intrinsics dp4a, which is declared in cuda header <sm61_intrinsics.h>. however i found that NVRTCCompile did not accept any headers (https://github.com/dmlc/tvm/blob/5c84a98a1d25dbc7c0f322b5eb284c2ffd5cd5d1/src/codegen/opt/build_cuda_on.cc#L94), what’s the suggested way to register such intrinsics?

Hi, @vinx13

In the case of FP16 support, the header is added to the cuda code that is generated in here:


In addition, the search path for cuda’s headers is added by include-path option for nvrtcCompileProgram:

On the other hand, I also think that it may be better to add headers in NVRTCCreateProgram as you pointed out. If there are no problems after trying it, I’ll change the way of including headers to it.

BTW, tvm has not supported int8 for cuda yet. But, I’m just working for it and I’ll make PR in few days :wink:

Thnaks

@nishi-t I see, so in case that another header is needed, i need to modify the codegen part. thanks