Does TVM interanly convert the input cpu to cuda or other target device?

twmht · July 8, 2023, 3:08am

I have seen the tutorial to see how to feed the input to TVM model, For example, this is a cuda inference tutorial Deploy a Quantized Model on Cuda — tvm 0.14.dev0 documentation (apache.org)

def run_inference(mod):
model = relay.create_executor("vm", mod, dev, target).evaluate()
val_data, batch_fn = get_val_data()
for i, batch in enumerate(val_data):
    data, label = batch_fn(batch)
    prediction = model(data)
    if i > 10:  # only run inference on a few samples in this tutorial
        break

data is a numpy array, which is on cpu, i guess tvm would internally convert the cpu input to gpu input? but what if the input is already on the gpu device?

In TensorRT API they support to pass a pointer as the model input, if the input is not on the cuda device it would throw the exception. Can TVM support to pass a pointer to the model?

if i have a cupy input array, can i pass the input the tvm model?