Add option to enable CUDA stream per thread in tvm runtime

wweic · February 26, 2021, 12:56am

Hi,

I’m wondering if we can add an option to tvm to enable CUDA stream per thread since CUDA 7 (GPU Pro Tip: CUDA 7 Streams Simplify Concurrency | NVIDIA Developer Blog). It will automatically improve the throughput for cases that we run multiple tvm processes (common in cloud deployment), and have no behavior change to other single process deployment scenarios. If this makes sense, I’ll send out a PR to enable this option.

cc @haichen @zhiics @trevor-m @junrushao @haojin2 @tqchen

tqchen · February 27, 2021, 12:06am

Thanks @wweic would this amount to call setstream to 1(The per thread CUDA default stream)

haichen · February 27, 2021, 5:51am

Yes, I think that’s be a good idea.

wweic · February 27, 2021, 8:18am

Are u saying the stream number for the per thread CUDA default stream is 1? I ran a 3 thread concurrent TVM runtime with the stream-per-thread enabled, each process seems to get different stream number. Or do you have documentation for it?

I’m currently thinking to add a CMake option to configure this behavior during build time. Quick patch (Enable stream-per-thread · wweic/tvm@8db2517 · GitHub)

tqchen · February 27, 2021, 2:25pm

https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior__default-stream

Sorry it seels that 1 is “legacy default stream” (process global), and 2 is “per thread default stream”.

tqchen · February 27, 2021, 2:27pm

I still think be useful to still follow the most legacy behavior of NVCC, mainly to make sure things are consistent with existing softwares, and can impact the behavior during say data exchange.

See also a related discussions here Specify synchronization semantics · Issue #57 · dmlc/dlpack · GitHub

The users can always call DeviceAPI->SetStream to set the current context stream to 2 which is the per thread default stream.

wweic · February 28, 2021, 7:28pm

Thanks @tqchen. Let me take a look at the API and maybe follow up with a documentation PR to share how to do this.

puddingfjz · December 30, 2021, 4:21pm

May I ask if there is a follow up PR about this function?

I also want to ask what is the easiest way to run two kernels on a GPU concurrently in TVM.

Thanks a lot!