Hi, When my target is set to CUDA, I found that the computational graph runs in a single stream format. Can multiple streams be generated under TVM?
The example used is ResNet-50 v2 mentioned in the document.
Does anyone know the answer? Thanks.
Hi, When my target is set to CUDA, I found that the computational graph runs in a single stream format. Can multiple streams be generated under TVM?
The example used is ResNet-50 v2 mentioned in the document.
Does anyone know the answer? Thanks.
hi @cheng , you might want to try using CUDA Graph directly? tvm/tests/python/runtime/test_runtime_module_based_interface.py at main · apache/tvm (github.com)
Thank you for your answer.@LeiWang1999
I understand that CUDA’s capture is not intended for OP parallelism. If I want to achieve parallelism between OP, does TVM provide any methods?
I found that some ops in ResNet-50 can be parallelized, but the final generated code is serial? I don’t understand the purpose of doing this.