Do we have any way to process codegen with more fine grade control?

When we want to do some advanced optimization like register blocking the goal you want to achieve , TVM codegen can not handle it very well. My experience is 1. write micro gemm like 4x4 or 8x8 and then tensorize 2. try, try and try different schedule and find one combination to match your expectation, it is very painful. Maybe tensorir like @junrushao mentioned could solve it better, but I don’t think it could solve this low level fine-grained control problem completely.

1 Like