I am trying to compile TVM with NCCL. I build NCCL from source (GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication, I tried both main and v2.19 branches). The NCCL tests in GitHub - NVIDIA/nccl-tests: NCCL Tests run without problem.
In my cmake, I use set(USE_NCCL <>/nccl/build). In the output of running cmake, I see:
- Found NCCL_LIBRARY: <>/nccl/build/lib/libnccl_static.a
- Found NCCL_INCLUDE_DIR:<>/nccl/build/include
But when running make, I get the following errors:
<>/tvm/3rdparty/tensorrt_llm/custom_allreduce_kernels.cu(88): error: no operator "+" matches these operands
operand types are: half2 + half2
c.unpacked[0] = a.unpacked[0] + b.unpacked[0];
^
detected during:
instantiation of "int4 tensorrt_llm::add128b(T &, T &) [with T=tensorrt_llm::PackedHalf]" at line 163
instantiation of "void tensorrt_llm::oneShotAllReduceKernel<T,RANKS_PER_NODE>(tensorrt_llm::AllReduceParams) [with T=half, RANKS_PER_NODE=2]" at line 337
instantiation of "void tensorrt_llm::dispatchARKernels<T,RANKS_PER_NODE>(tensorrt_llm::AllReduceStrategyType, tensorrt_llm::AllReduceParams &, int, int, cudaStream_t) [with T=half, RANKS_PER_NODE=2]" at line 357
instantiation of "void tensorrt_llm::invokeOneOrTwoShotAllReduceKernel<T>(tensorrt_llm::AllReduceParams &, tensorrt_llm::AllReduceStrategyType, cudaStream_t) [with T=half]" at line 389
And also:
In file included from <>/tvm/src/runtime/disco/nccl/nccl.cc:27:
<>/tvm/src/runtime/disco/nccl/nccl_context.h:37:10: fatal error: nccl.h: No such file or directory
37 | #include <nccl.h>
| ^~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:973:
CMakeFiles/tvm_runtime_objs.dir/src/runtime/disco/nccl/nccl.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from <>/tvm/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc:28:
<>/tvm/src/runtime/disco/cuda_ipc/../nccl/nccl_context.h:37:10: fatal error: nccl.h: No such file or directory
37 | #include <nccl.h>
| ^~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:945:
CMakeFiles/tvm_runtime_objs.dir/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc.o] Error 1
In file included from <>/tvm/src/runtime/disco/cuda_ipc/custom_allreduce.cc:26:
<>/tvm/src/runtime/disco/cuda_ipc/../nccl/nccl_context.h:37:10: fatal error: nccl.h: No
such file or directory
37 | #include <nccl.h>
| ^~~~~~~~
compilation terminated.