How to build a schedule with a specific executor

Hi!

I am trying to modify the VTA examples for a small test, but I keep getting an error when I try to build a schedule using the AOT executor. I will explain better what I am trying to do with a minimal example to reproduce the issue:

import tvm
from tvm import te
import numpy as np
from tvm import relay
from tvm import relay

m = 1
n = 1
o = 1
A = te.placeholder((o, n, 1, 8), name="A", dtype="int8")
B = te.placeholder((m, n, 8, 8), name="B", dtype="int8")

# Outer input feature reduction axis
ko = te.reduce_axis((0, n), name="ko")
# Inner input feature reduction axis
ki = te.reduce_axis((0, 8), name="ki")
# Describe the matrix multiplication
C = te.compute(
    (o, m, 1, 8),
    lambda bo, co, bi, ci: te.sum(
        A[bo, ko, bi, ki].astype("int8") * B[co, ko, ci, ki].astype("int8"),
        axis=[ko, ki],
    ),
    name="C",
)

sch = te.create_schedule(C.op)

mod = tvm.lower(sch, [A, B, C], simple_mode=True, name="main")

print(mod)

RUNTIME = tvm.relay.backend.Runtime("crt", {"system-lib": True})
TARGET = tvm.target.target.Target("c")
EXECUTOR = tvm.relay.backend.Executor("aot",options={'interface-api': 'c','unpacked-api': 1})
with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}, disabled_pass=["AlterOpLayout"]):
    module = relay.build(mod, executor=EXECUTOR, runtime=RUNTIME, target=TARGET)

If I run the previous code, I get the following error in relay.build:

Check failed: (can_dispatch(n)) is false: NodeFunctor calls un-registered function on type tir.PrimFunc

But I need to build this schedule with the AOT executor, and so far this is the only way I know. Has anyone encountered the same problem? Am I missing a conversion somewhere?

EDIT:

When running the following line:

module = tvm.build(mod, target="c --executor=aot --unpacked-api=1")

The build works fine, but the output is still not what I expected. Actually, if I put gibberish in the executor parameter, the build also works. So, it seems that tvm.build is not looking at that parameter.

This is how the schedule looks like ones lowered:

@main = primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {C: Buffer(C_2: Pointer(int8), int8, [1, 1, 1, 8], []),
             B: Buffer(B_2: Pointer(int8), int8, [1, 1, 8, 8], []),
             A: Buffer(A_2: Pointer(int8), int8, [1, 1, 1, 8], [])}
  buffer_map = {A_1: A, B_1: B, C_1: C} {
  for (ci: int32, 0, 8) {
    C_2[ci] = 0i8
    for (ki: int32, 0, 8) {
      C_2[ci] = ((int8*)C_2[ci] + ((int8*)A_2[ki]*(int8*)B_2[((ci*8) + ki)]))
    }
  }
}