Hello. I found a small issue of heterogeneous execution when using a same variable on different devices from annotation API.
The code I used is as follows.
import numpy as np
import tvm
from tvm import relay
dshape = (5,5)
data1 = relay.var("data1", shape=dshape)
data2 = relay.var("data2", shape=dshape)
add1 = relay.add(data1,data2)
add2 = relay.add(add1, data2)
dev1 = tvm.context(1)
dev2 = tvm.context(2)
_add_1 = relay.annotation.on_device(add1, dev1)
_add_2 = relay.annotation.on_device(add2, dev2)
func = relay.Function([data1, data2],
relay.Tuple(tvm.convert([_add_1, _add_2, add2])))
func = relay.ir_pass.infer_type(func)
func = relay.ir_pass.rewrite_annotated_ops(func,
tvm.context("cpu").device_type)
func = relay.ir_pass.infer_type(func)
func = relay.Function(relay.ir_pass.free_vars(func.body[2]), func.body[2])
d1 = np.random.uniform(size=dshape).astype('float32')
d2 = np.random.uniform(size=dshape).astype('float32')
w = np.random.uniform(size=dshape).astype('float32')
config = {"opt_level": 1}
target = {"cpu": "llvm", "cuda": "cuda"}
params = {}
with relay.build_config(**config):
graph, lib, params = relay.build(
func,
target,
params = params)
contexts = [tvm.cpu(0), tvm.context("cuda")]
mod = tvm.contrib.graph_runtime.create(graph, lib, contexts)
mod.set_input(**params)
mod.set_input("data1",d1)
mod.set_input("data2",d2)
mod.run()
result = mod.get_output(0).asnumpy()
Error is as follows.
TVMError: [16:52:34] /home/morinaga/tvm/tvm/src/runtime/module_util.cc:53: Check failed: ret == 0 (-1 vs. 0) Assert fail: (2 == tvm_struct_get(arg1, 0, 10)), Argument arg1.device_type has an unsatisfied constraint
This is due to the lack of device_copy, so we can avoid this issue like following example.
## Example
data1 = relay.var("data1", shape=dshape)
data2 = relay.var("data2", shape=dshape)
add2 = relay.add(data1,weight)
_data2 = relay.device_copy(_data2, dev1, dev2)
add2 = relay.add(add1, _data2)
_add_1 = relay.annotation.on_device(add1, dev1)
_add_2 = relay.annotation.on_device(add2, dev2)
In my thoughts user should only be aware of the operator placement in annotation API (without using device_copy) . So this should be resolved while compiling relay graph. It seems not difficult to insert device_copy in relay.ir_pass. However, inserting device_copy between data2 and add2 might have different performance from inserting between data2 and add1. Another way is treat as usage miss and raise error.
What is the cleverest way?