I want to generate TIR level operations automatically, for that I have seen
- te.compute,
- tir.ir_builder.
However, I have run into issues with both of them:
-
te.compute
- where the computation is not executed over the whole tensor, like a partial update during a computation step
- where the ouput dtype differs from the input dtype (argmin/max operations)
-
tir.ir_builder
- can only be used as an external function, so transformations cannot be applied to it (thread binds, tune etc.)
Because of the above, I am looking into calling the TIR constructors directly, as separate indexing is possible with that, and the resulting tensor is not external.
Therefore, my question is such: which is the current recommended way to go about this issue?