Hi, I am trying to offload the supported part of a BERT-base model to ACL using the official BYOC pass. And three kinds of operators: dense, add, reshape are supported.
For “reshape”, it works perfectly fine after graph partition and runtime codegen.
For “add”, it run into a segmentation fault at my first try. After some debugging via gdb, it turns out that when an “add” operator comes with a constant input variable (0.5f in my case) is involved, which needs proper broadcasting during the implementation, the segmentation fault will be triggered. I temporarily solved this by skipping all “add” operator with a constant input variable:
@tvm.ir.register_op_attr("add", "target.arm_compute_lib")
def add(expr):
"""Check if the external ACL codegen for add should be used."""
args = expr.args
for typ in [args[0].checked_type, args[1].checked_type]:
if typ.dtype != "float32":
return False
if type(args[0]) is tvm.relay.expr.Constant:
return False
if type(args[1]) is tvm.relay.expr.Constant:
return False
return True
However, for “dense” operators I’m also getting a segmentation fault. And this one seems relevant to the transpose operation. After some look into the relay IR before and after the BYOC process, I am guessing an extra transpose operation is done by tvm operator and is not eliminated in the BYOC, which could possibly mess the data layout before entering the offloaded part, causing the segmentation error.
But I failed to come up with a solution for this problem, any suggestions? @lhutton1 @comaniac