One suggestion that I have for TVM is to add a cleaner exit from the stack.
For example, for opencl/ cuda targets, what do I do if I just want the generated kernels?
Note: there is a way to print the source for CL, but unfortunately I have not found a way to get the work group / threadblock sizes and dimensions, which are needed to use the kernels. Surely, those parameters were tuned.