I have seen the tutorial to see how to feed the input to TVM model, For example, this is a cuda inference tutorial Deploy a Quantized Model on Cuda — tvm 0.14.dev0 documentation (apache.org)
def run_inference(mod):
model = relay.create_executor("vm", mod, dev, target).evaluate()
val_data, batch_fn = get_val_data()
for i, batch in enumerate(val_data):
data, label = batch_fn(batch)
prediction = model(data)
if i > 10: # only run inference on a few samples in this tutorial
break
data is a numpy array, which is on cpu, i guess tvm would internally convert the cpu input to gpu input? but what if the input is already on the gpu device?
In TensorRT API they support to pass a pointer as the model input, if the input is not on the cuda device it would throw the exception. Can TVM support to pass a pointer to the model?
if i have a cupy input array, can i pass the input the tvm model?