You can define the computation in a te style compute as well and then generate the TIR to be scheduled. Below is a small example for matmul:
import tvm
from tvm import te
n = 128
n = te.size_var("n")
A = te.placeholder((n,n), dtype="float32", name="A")
B = te.placeholder((n,n), dtype="float32", name="B")
k = te.reduce_axis((0,n), name="k")
C = te.compute((n,n), lambda i, j: te.sum(A[k,i]*B[k,j],axis=[k]), name="C")
# create tir prim_func
tir_func = te.create_prim_func([A,B,C])
print(tir_func)
# Create TIR schedule
sch = tvm.tir.Schedule(tir_func)
# Schedule with TIR schedules
c_block = sch.get_block("C")
i, j, k = sch.get_loops(c_block)
sch.parallel(i)
sch.reorder(k, j)
sch.vectorize(j)
# Print the scheduled mod
print(sch.mod)
I’m not an expert with metaschedule, but I’ll try to explain what I understand. I guess you’re asking about being able to tune on custom hardware, and yes, as far as I understand, any hardware can be supported with some basic implementation to define how to build and run on that hardware.
This can be done by adding support for building for that hardware and supporting RPC runner as far as I understand, but I guess @junrushao might be a better person to answer this question.