What is the recommended way to generate TIR level operations?

I want to generate TIR level operations automatically, for that I have seen

  1. te.compute,
  2. tir.ir_builder.

However, I have run into issues with both of them:

  • te.compute

    1. where the computation is not executed over the whole tensor, like a partial update during a computation step
    2. where the ouput dtype differs from the input dtype (argmin/max operations)
  • tir.ir_builder

    1. can only be used as an external function, so transformations cannot be applied to it (thread binds, tune etc.)

Because of the above, I am looking into calling the TIR constructors directly, as separate indexing is possible with that, and the resulting tensor is not external.

Therefore, my question is such: which is the current recommended way to go about this issue?