Is it possible to find the best kernel in cutlass using TVM?

Since cutlass is a template library, users can define many different kernels(different tile sizes, pipeline stages num, etc.), making it hard to find the best config for new hardware or input shape. Profiling on the fly will hurt the system’s performance. I am new to TVM, but TVM sounds like a good choice for picking the best kernel. It seems that TVM has some support for cutlass already. Can TVM pick the best kernel given the input shape and hardware? How is the performance compared with profiling and auto-tune?

Yes, we support cutlass by generating many kernels and find the best one, see tests/python/contrib/test_cutlass.py.

@masahi Thanks! This looks very cool! Does this support online searches? Will it find the best kernel online fast(or at least faster than profiling and auto-tune)?

What do you mean by “online”?

@masahi “online” means I can’t know the input shapes and hardware info ahead of time, so I have to search for the best kernel just in time.

For example, if a new model is training and new shapes occur, I hope the system can still find the best kernel for the new shapes.

Profiling just in time when new shapes or new hardware occur is a choice. But TVM sounds better and more modern. :slight_smile:

Dynamic shape is supported but currently there is no smart heuristic to choose a good kernel. Runtime profiling and JIT is an interesting topic but we haven’t got there yet.

Thanks. Looking forward to the update of TVM. : )