Add option to enable CUDA stream per thread in tvm runtime

Hi,

I’m wondering if we can add an option to tvm to enable CUDA stream per thread since CUDA 7 (GPU Pro Tip: CUDA 7 Streams Simplify Concurrency | NVIDIA Developer Blog). It will automatically improve the throughput for cases that we run multiple tvm processes (common in cloud deployment), and have no behavior change to other single process deployment scenarios. If this makes sense, I’ll send out a PR to enable this option.

cc @haichen @zhiics @trevor-m @junrushao @haojin2 @tqchen

4 Likes

Thanks @wweic would this amount to call setstream to 1(The per thread CUDA default stream)

1 Like

Yes, I think that’s be a good idea.

Are u saying the stream number for the per thread CUDA default stream is 1? I ran a 3 thread concurrent TVM runtime with the stream-per-thread enabled, each process seems to get different stream number. Or do you have documentation for it?

I’m currently thinking to add a CMake option to configure this behavior during build time. Quick patch (Enable stream-per-thread · wweic/tvm@8db2517 · GitHub)

https://docs.nvidia.com/cuda/cuda-runtime-api/stream-sync-behavior.html#stream-sync-behavior__default-stream

Sorry it seels that 1 is “legacy default stream” (process global), and 2 is “per thread default stream”.

I still think be useful to still follow the most legacy behavior of NVCC, mainly to make sure things are consistent with existing softwares, and can impact the behavior during say data exchange.

See also a related discussions here Specify synchronization semantics · Issue #57 · dmlc/dlpack · GitHub

The users can always call DeviceAPI->SetStream to set the current context stream to 2 which is the per thread default stream.

Thanks @tqchen. Let me take a look at the API and maybe follow up with a documentation PR to share how to do this.

May I ask if there is a follow up PR about this function?

I also want to ask what is the easiest way to run two kernels on a GPU concurrently in TVM.

Thanks a lot!