How to do heterogeneous execution on cpu and gpu?

yanyu1268 · November 25, 2021, 11:36pm

Hello,

I have read some posts from forum,but I still confused about that.

If I want to using Relay to build a simple network and heterogeneous execution some Ops on gpu and others on cpu. There seem to be two different ways.

One is through relay.annotation.on_device, relay.device_copy and relay.transform.RewriteAnnotatedOps. After that, relay graph will rewrite, and I can do relay.build. But my TVM version is 0.8, it seems not working. Or Is my usage wrong? I’m not sure how to do in current version.
Another way is part of BYOC, but I just want to try heterogeneous execution on gpu and cpu. It doesn’t seem to be needed?

I want to check that do heterogeneous execution what difference will be on the json. I have read some code about Jsonreader and graph_executor. I guess if i do heterogeneous execution,json will have some tvm_op which func_name is “__copy” to copy data between device,and device_index will denote every node should execution on which device. Is my guess correct?

I’m new to TVM, any help or suggestions are massively appreciated!

wrongtest · November 26, 2021, 6:20am

Hi~ Can this unittest case help you?

github.com

apache/tvm/blob/be03d62e5b0afd607964365bc73e94f72fdfaaef/tests/python/relay/test_vm.py#L1071


        input(),
        tvm.target.Target("cuda"),
    )


    # The newshape annotation should have been turned into a constant on the CPU.
    assert "VirtualDevice[0]: device type 1" in exe.virtual_devices
    assert "Constant[0]: has shape int64[3] on device index 0" in exe.constants




@tvm.testing.requires_cuda
def test_multi_targets():
    # Build an IRModule.
    n = 10
    x = relay.var("x", shape=(n,))
    y = relay.var("y", shape=(n,))
    z = relay.var("z", shape=(n,))
    f = relay.Function([x, y, z], x + relay.op.annotation.on_device(y + z, tvm.cpu()))
    mod = IRModule.from_expr(f)


    # Compile to VMExecutable.
    with tvm.transform.PassContext(

wrongtest · November 26, 2021, 6:28am

If you are using relay.build() → graph_executor.GraphModule path, the point I remember is that it should pass a multi-target dict into target argument of build and pass a device list into GraphModule like

lib = relay.build(relay_mod, target={"cpu": "llvm", "gpu": "cuda"}, params=params)
m = graph_executor.GraphModule(lib["default"](tvm.cpu(), tvm.gpu()))