[Heterogeneous execution] Heterogeneous compilation failure

I’m interested in compiling certain components of a relay graph for GPU execution and let the rest run on CPUs. I load the relay graph through a tensorflow frontend converter and uses relay annotation on_device to specify nodes that can execute on GPUs. I am seeing the following failure: “Attribute FTVMCompute has not been registered for Operator on_device”. By tracking the operator “on_device”, I find that in annotation.cc, “on_device” is registered as a relay op, but it does not have an FTVMCompute attribute whereas other annotations such as “annotation.stop_fusion” has. How is “on_device” processed in the compilation process and any insight on why the check is failing?

Stack trace:
Traceback (most recent call last):
File “bert_tvm.py”, line 137, in
tvm_runtime, params = get_tvm_runtime(graph_def)
File “bert_tvm.py”, line 83, in get_tvm_runtime
graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/build_module.py”, line 290, in build
graph_json, lowered_funcs, params = graph_gen.codegen(func)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 427, in codegen
self.heads = self.visit(func.body)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 260, in visit_call
self.target[call_dev_type])
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/compile_engine.py”, line 84, in lower
raise RuntimeError(msg)
RuntimeError: Traceback (most recent call last):
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/compile_engine.py”, line 76, in lower
return _backend._CompileEngineLower(self, key)
File "/home/minjiaz/workspace/TVM/python/tvm/_ffi/ctypes/function.py", line 190, in call
raise get_last_ffi_error()
tvm.ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138d7e7) [0x7f58d652f7e7]
[bt] (7) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138fd7d) [0x7f58d6531d7d]
[bt] (6) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x13945cd) [0x7f58d65365cd]
[bt] (5) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x1396a7d) [0x7f58d6538a7d]
[bt] (4) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138ee6d) [0x7f58d6530e6d]
[bt] (3) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138ab3c) [0x7f58d652cb3c]
[bt] (2) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138da5c) [0x7f58d652fa5c]
[bt] (1) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x10306cb) [0x7f58d61d26cb]
[bt] (0) /home/minjiaz/workspace/TVM/build/libtvm.so(+0xb2c488) [0x7f58d5cce488]
File “/home/minjiaz/workspace/TVM/include/tvm/relay/./op.h”, line 500
**TVMError: Check failed: idx < data.size() && data
[idx].second != 0: Attribute FTVMCompute has not been registered for Operator on_device
**

Code snippet:

target = {“cpu”: “llvm -libs=mkl”, “cuda”: “cuda”}

target_host = “llvm”

contexts = [tvm.cpu(0), tvm.context(“cuda”)]

class ScheduleDense(ExprMutator):
def init(self, device):
self.device = device
super(ScheduleDense, self).init()

def visit_call(self, expr):
    visit = super().visit_call(expr)
    if expr.op == tvm.relay.op.get("nn.dense"):
        return relay.annotation.on_device(visit, self.device)
    else:
        return visit            

def schedule_dense_on_gpu(expr):
sched = ScheduleDense(tvm.gpu(0))
return sched.visit(expr)

def get_tvm_runtime(graph_def):
sym, params = relay.frontend.from_tensorflow(graph_def, shape=shape_dict, layout=None, outputs=[“bert/pooler/dense/Tanh”])

# rewrite annotation 
sym = schedule_dense_on_gpu(sym)
sym = relay.ir_pass.rewrite_annotated_ops(sym, tvm.context("cpu").device_type)

print("Building relay graph...")
with relay.build_config(opt_level=0, fallback_device="llvm"):
    graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params) // Failed here

creturn graph_runtime.create(graph, lib, contexts)

@zhangninja I sent a PR this afternoon with a unit test on who you can use an additional pass for annotation.

Thank you, @zhiics. That seems to solve the issue.