While I play with test_matmul_offload in tests/python/relax/test_codegen_cutlass.py, I find the runtime profiled by python/tvm/contrib/cutlass/gen_tensor_op.py would always return float(“inf”).
The reason for this inf result is because subprocess.run got cutlass profiler error as “Got cutlass error: Error Internal at: 77”.
As I dig deeper, I find for dynamic shape as " ((_vars[“a”], 6), (6, 16), False, “bias”, “none”)," pattern, it would pass [./tmp/profiler a 6 16] into subprocess.run. which lead the “a” interpreated as 0 in the profiler.
So I wonder is it expected behavior that make dynamic shape profiler return the inf timing result? If not, how we shall fix it? I think maybe we need to tell the under profiler the low bound and high bound of the dynamic shape?
As other backend which support dynamic shape, like tensorRT, it would accept a range of dynamic shape, [low, high] and the optimzied. And it would give the best tuning for the “optimized” value. Does tvm has this feature for relax?