As in this pr, we plan to support indexing with int64 variables, so that large tensors with more than 2^31 elements can be supported. We plan to support llvm and cuda first, and it has been tested on these two backends by adding two large tensors.
As tests with large tensors are costly in terms of both memory and time, I am not sure about where to add the tests. Any advice is welcome.