vinx13
July 20, 2018, 8:45am
1
hi, i’m trying to call cuda intrinsics dp4a
, which is declared in cuda header <sm61_intrinsics.h>. however i found that NVRTCCompile
did not accept any headers (https://github.com/dmlc/tvm/blob/5c84a98a1d25dbc7c0f322b5eb284c2ffd5cd5d1/src/codegen/opt/build_cuda_on.cc#L94 ), what’s the suggested way to register such intrinsics?
Hi, @vinx13
In the case of FP16 support, the header is added to the cuda code that is generated in here:
CHECK_EQ(vid_global_barrier_state_, runtime::symbol::tvm_global_barrier_state);
}
void CodeGenCUDA::AddFunction(LoweredFunc f) {
this->stream << "extern \"C\" __global__ ";
CodeGenC::AddFunction(f);
}
std::string CodeGenCUDA::Finish() {
if (enable_fp16_) {
decl_stream << "#include <cuda_fp16.h>\n";
}
return CodeGenC::Finish();
}
void CodeGenCUDA::VisitStmt_(const ir::For* op) {
CHECK(is_zero(op->min));
if (op->for_type == ir::ForType::Unrolled) {
PrintIndent();
stream << "#pragma unroll\n";
In addition, the search path for cuda’s headers is added by include-path option for nvrtcCompileProgram:
cc = std::to_string(device_prop.major) + std::to_string(device_prop.minor);
} else {
LOG(WARNING) << "cannot detect compute capability from your device, "
<< "fall back to compute_30.";
}
compile_params.push_back("-arch=compute_" + cc);
num_options++;
if (include_path) {
std::string include_option = "--include-path=" + FindCUDAIncludePath();
compile_params.push_back(include_option);
num_options++;
}
for (const auto& string : compile_params) {
param_cstrings.push_back(string.c_str());
}
NVRTC_CALL(nvrtcCreateProgram(
&prog, code.c_str(), nullptr, 0, nullptr, nullptr));
On the other hand, I also think that it may be better to add headers in NVRTCCreateProgram as you pointed out. If there are no problems after trying it, I’ll change the way of including headers to it.
BTW, tvm has not supported int8 for cuda yet. But, I’m just working for it and I’ll make PR in few days
Thnaks
vinx13
July 23, 2018, 5:39am
3
@nishi-t I see, so in case that another header is needed, i need to modify the codegen part. thanks