[Heterogeneous execution] Heterogeneous compilation failure

zhangninja · May 30, 2019, 9:45pm

I’m interested in compiling certain components of a relay graph for GPU execution and let the rest run on CPUs. I load the relay graph through a tensorflow frontend converter and uses relay annotation on_device to specify nodes that can execute on GPUs. I am seeing the following failure: “Attribute FTVMCompute has not been registered for Operator on_device”. By tracking the operator “on_device”, I find that in annotation.cc, “on_device” is registered as a relay op, but it does not have an FTVMCompute attribute whereas other annotations such as “annotation.stop_fusion” has. How is “on_device” processed in the compilation process and any insight on why the check is failing?

Stack trace:
Traceback (most recent call last):
File “bert_tvm.py”, line 137, in
tvm_runtime, params = get_tvm_runtime(graph_def)
File “bert_tvm.py”, line 83, in get_tvm_runtime
graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/build_module.py”, line 290, in build
graph_json, lowered_funcs, params = graph_gen.codegen(func)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 427, in codegen
self.heads = self.visit(func.body)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 270, in visit_call
res = self.visit(arg)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/expr_functor.py”, line 30, in visit
res = self.visit_call(expr)
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/graph_runtime_codegen.py”, line 260, in visit_call
self.target[call_dev_type])
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/compile_engine.py”, line 84, in lower
raise RuntimeError(msg)
RuntimeError: Traceback (most recent call last):
File “/home/minjiaz/workspace/TVM/python/tvm/relay/backend/compile_engine.py”, line 76, in lower
return _backend._CompileEngineLower(self, key)
File "/home/minjiaz/workspace/TVM/python/tvm/_ffi/ctypes/function.py", line 190, in call
raise get_last_ffi_error()
tvm.ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138d7e7) [0x7f58d652f7e7]
[bt] (7) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138fd7d) [0x7f58d6531d7d]
[bt] (6) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x13945cd) [0x7f58d65365cd]
[bt] (5) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x1396a7d) [0x7f58d6538a7d]
[bt] (4) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138ee6d) [0x7f58d6530e6d]
[bt] (3) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138ab3c) [0x7f58d652cb3c]
[bt] (2) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x138da5c) [0x7f58d652fa5c]
[bt] (1) /home/minjiaz/workspace/TVM/build/libtvm.so(+0x10306cb) [0x7f58d61d26cb]
[bt] (0) /home/minjiaz/workspace/TVM/build/libtvm.so(+0xb2c488) [0x7f58d5cce488]
File “/home/minjiaz/workspace/TVM/include/tvm/relay/./op.h”, line 500
**TVMError: Check failed: idx < data.size() && data[idx].second != 0: Attribute FTVMCompute has not been registered for Operator on_device**

Code snippet:

target = {“cpu”: “llvm -libs=mkl”, “cuda”: “cuda”}

target_host = “llvm”

contexts = [tvm.cpu(0), tvm.context(“cuda”)]

class ScheduleDense(ExprMutator):
def init(self, device):
self.device = device
super(ScheduleDense, self).init()

def visit_call(self, expr):
    visit = super().visit_call(expr)
    if expr.op == tvm.relay.op.get("nn.dense"):
        return relay.annotation.on_device(visit, self.device)
    else:
        return visit

def schedule_dense_on_gpu(expr):
sched = ScheduleDense(tvm.gpu(0))
return sched.visit(expr)

def get_tvm_runtime(graph_def):
sym, params = relay.frontend.from_tensorflow(graph_def, shape=shape_dict, layout=None, outputs=[“bert/pooler/dense/Tanh”])

# rewrite annotation 
sym = schedule_dense_on_gpu(sym)
sym = relay.ir_pass.rewrite_annotated_ops(sym, tvm.context("cpu").device_type)

print("Building relay graph...")
with relay.build_config(opt_level=0, fallback_device="llvm"):
    graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params) // Failed here

creturn graph_runtime.create(graph, lib, contexts)

zhiics · May 31, 2019, 12:14am

@zhangninja I sent a PR this afternoon with a unit test on who you can use an additional pass for annotation.

github.com/apache/tvm

[relay][heterogeneous] annotate using visitor

master ← zhiics:hetero

opened 08:59PM - 30 May 19 UTC

zhiics

+94 -43

Recently, there are some discussions about how to leverage an additional pass to… annotate a Relay program for heterogeneous compilation. This PR adds a unit test to show it and it slightly changes the manual way for annotation. Now users need to connect nodes but don't need to pass the annotation exprs to the backend. This following example has been executed locally: ```python import tvm from tvm import relay import tvm.relay.testing from tvm.relay.expr_functor import ExprMutator class ScheduleConv2d(ExprMutator): def __init__(self, device): self.device = device super().__init__() def visit_call(self, expr): visit = super().visit_call(expr) if expr.op == tvm.relay.op.get("nn.conv2d"): return relay.annotation.on_device(visit, self.device) else: return visit def schedule_conv2d_on_gpu(expr): sched = ScheduleConv2d(tvm.gpu(0)) return sched.visit(expr) resnet, params = relay.testing.resnet.get_workload() resnet = schedule_conv2d_on_gpu(resnet) resnet = relay.ir_pass.infer_type(resnet) target = {"gpu": "cuda", "cpu": "llvm"} with relay.build_config(opt_level=3, fallback_device=tvm.cpu()): json, mod, params = relay.build(resnet, target=target) print(json) ``` cc' @jroesch this should be what you want. @imorinaga @anijain2305 @jwfromm

zhangninja · June 4, 2019, 3:41am

Thank you, @zhiics. That seems to solve the issue.