I have some problems when compiling a float16 ResNet18 and inception_v3 model.
RuntimeError: Compilation error:
/tmp/tmpoiw0p21t/my_kernel.cu(2207): warning: attribute “shared” does not apply here
/tmp/tmpoiw0p21t/my_kernel.cu(2207): error: no operator "=" matches these operands
operand types are: volatile half = half
/tmp/tmpoiw0p21t/my_kernel.cu(2210): warning: attribute "shared" does not apply here
I got kind of related errors yesterday, and I think the issue here [ERROR] Half type support in NVRTC is related. For now, I just use an old commit until the nvrtc issues are fixed.
diagonaled_mm = tvm.build(s, [X, Y, Z, D, w, w_upper, padding, transpose_t1, t3d3], target=device, target_host=tgt_host, name='diagonaled_mm')
/usr/tvm/python/tvm/build_module.py:636: in build
fhost, mdev = _build_for_device(flist, tar, target_host)
/usr/tvm/python/tvm/build_module.py:502: in _build_for_device
mdev = codegen.build_module(fdevice, str(target)) if fdevice else None
/usr/tvm/python/tvm/codegen.py:36: in build_module
return _Build(lowered_func, target)
/usr/tvm/python/tvm/_ffi/_ctypes/function.py:207: in __call__
raise get_last_ffi_error()
E tvm._ffi.base.TVMError: Traceback (most recent call last):
E operand types are: half * half
E function "operator*(__half, __half)"
E function "operator*(const __half &, const __half &)"
E File "/usr/tvm/src/codegen/opt/build_cuda_on.cc", line 119
E TVMError: Check failed: compile_res == NVRTC_SUCCESS (6 vs. 0) : default_program(31): error: more than one operator "*" matches these operands:
E default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E default_program(31): error: more than one operator "*" matches these operands:
E function "operator*(const __half &, const __half &)"
E function "operator*(__half, __half)"
E operand types are: half * half
E
E default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E default_program(31): error: more than one operator "*" matches these operands:
E function "operator*(const __half &, const __half &)"
E function "operator*(__half, __half)"
E operand types are: half * half
E
E default_program(31): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E default_program(46): error: more than one operator "*" matches these operands:
E function "operator*(const __half &, const __half &)"
E function "operator*(__half, __half)"
E operand types are: half * half
E
E default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E default_program(46): error: more than one operator "*" matches these operands:
E function "operator*(const __half &, const __half &)"
E function "operator*(__half, __half)"
E operand types are: half * half
E
E default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E default_program(46): error: more than one operator "*" matches these operands:
E function "operator*(const __half &, const __half &)"
E function "operator*(__half, __half)"
E operand types are: half * half
E
E default_program(46): error: ambiguous "?" operation: second operand of type "<error-type>" can be converted to third operand type "half", and vice versa
E
E 12 errors detected in the compilation of "default_program".
Yes,If I use this commit, I will have the errors which like ibeltagy said
default_program(46): error: more than one operator "*" matches these operands:
function "operator*(const __half &, const __half &)"
function "operator*(__half, __half)"
operand types are: half * half
yes, I removed these lines . errors that like like ibeltagy said would not appear again。 But the errosr I made first will appear。
/tmp/tmpoiw0p21t/my_kernel.cu(2207): error: no operator "=" matches these operands
operand types are: volatile half = half
/tmp/tmpoiw0p21t/my_kernel.cu(2210): warning: attribute "shared" does not apply here
I check the cuda code that generated by TVM, I found that the errors raised by fused_nn_max_pool2d_kernel0 fun。and this line raise this error : ((volatile __shared__ half*)red_buf0)[((int)threadIdx.x)] = tensor_rf[0];
/tmp/tmpc2ksk9tx/my_kernel.cu(250): error: more than one operator “" matches these operands:
function "operator(const __half &, const __half &)”
function “operator*(__half, __half)”
operand types are: half * half
After removing L65-L67 as you mentioned above, I got:
/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): warning: attribute “shared” does not apply here
/tmp/tmp615xi8fe/my_kernel.cu(9068): error: no operator “+” matches these operands
operand types are: volatile half + volatile half
/tmp/tmp615xi8fe/my_kernel.cu(9072): warning: attribute “shared” does not apply here
12 errors detected in the compilation of “/tmp/tmpxft_00003542_00000000-6_my_kernel.cpp1.ii”.
My CUDA version is 10.1 and device is Telsa V100, sm_70, testing script is:
import tvm
from tvm import relay
from tvm.relay import testing
target = 'cuda'
mod, params = testing.resnet.get_workload(dtype='float16')
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(mod, params=params, target=target)