Rust module "tvm_graph_rt": can it run on cuda device? can I iterate over its output without copying from gpu to cpu?

context: how do I use dlpackrs with apache TVM for zero-copy?

I’ve found that using tvm_graph_rt crate, the output is of Tensor, which has as_slice function that returns… [u8].

So… is it possible to iterate over output rows using as_slice without a full gpu-to-cpu copy? Also, there seems to be no option to specify device ('eg. cuda(0) etc) with tvm_graph_rt… is this module 100% cpu-only?