[Fixed] Problem finding NCCL header in make

I am trying to compile TVM with NCCL. I build NCCL from source (GitHub - NVIDIA/nccl: Optimized primitives for collective multi-GPU communication, I tried both main and v2.19 branches). The NCCL tests in GitHub - NVIDIA/nccl-tests: NCCL Tests run without problem.

In my cmake, I use set(USE_NCCL <>/nccl/build). In the output of running cmake, I see:

  • Found NCCL_LIBRARY: <>/nccl/build/lib/libnccl_static.a
  • Found NCCL_INCLUDE_DIR:<>/nccl/build/include

But when running make, I get the following errors:

<>/tvm/3rdparty/tensorrt_llm/custom_allreduce_kernels.cu(88): error: no operator "+" matches these operands
        operand types are: half2 + half2
c.unpacked[0] = a.unpacked[0] + b.unpacked[0];
                              ^
      detected during:
        instantiation of "int4 tensorrt_llm::add128b(T &, T &) [with T=tensorrt_llm::PackedHalf]" at line 163
        instantiation of "void tensorrt_llm::oneShotAllReduceKernel<T,RANKS_PER_NODE>(tensorrt_llm::AllReduceParams) [with T=half, RANKS_PER_NODE=2]" at line 337
        instantiation of "void tensorrt_llm::dispatchARKernels<T,RANKS_PER_NODE>(tensorrt_llm::AllReduceStrategyType, tensorrt_llm::AllReduceParams &, int, int, cudaStream_t) [with T=half, RANKS_PER_NODE=2]" at line 357
        instantiation of "void tensorrt_llm::invokeOneOrTwoShotAllReduceKernel<T>(tensorrt_llm::AllReduceParams &, tensorrt_llm::AllReduceStrategyType, cudaStream_t) [with T=half]" at line 389

And also:

In file included from <>/tvm/src/runtime/disco/nccl/nccl.cc:27:
<>/tvm/src/runtime/disco/nccl/nccl_context.h:37:10: fatal error: nccl.h: No such file or directory
  37 | #include <nccl.h>
     |          ^~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:973: 
CMakeFiles/tvm_runtime_objs.dir/src/runtime/disco/nccl/nccl.cc.o] Error 1
make[2]: *** Waiting for unfinished jobs....
In file included from <>/tvm/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc:28:
<>/tvm/src/runtime/disco/cuda_ipc/../nccl/nccl_context.h:37:10: fatal error: nccl.h: No such file or directory
37 | #include <nccl.h>
  |          ^~~~~~~~
compilation terminated.
make[2]: *** [CMakeFiles/tvm_runtime_objs.dir/build.make:945: 
CMakeFiles/tvm_runtime_objs.dir/src/runtime/disco/cuda_ipc/cuda_ipc_memory.cc.o] Error 1
In file included from <>/tvm/src/runtime/disco/cuda_ipc/custom_allreduce.cc:26:
<>/tvm/src/runtime/disco/cuda_ipc/../nccl/nccl_context.h:37:10: fatal error: nccl.h: No 
such file or directory
   37 | #include <nccl.h>
   |          ^~~~~~~~
compilation terminated.

Fixed in [TVM Unity] Got NCCL Error while using CUDA, CUDNN and NCCL for multi-GPU processing - #10 by fPecc