Irrespective of input same output iOS TVM model

in MetalWorkspace::CopyDataFromTo three situation is handled:

  1. Copy from Metal to Metal
  2. Copy from CPU to Metal
  3. Copy from Metal to CPU I.e. in your case one more extra copy will be done and it will be much easier currently just to pass NDArray working with CPU context. Could you pass m_cpuInput to set_input("INPUT", m_cpuInput); and verify the result?

Another difference - I looked in my code and figured out that I have not used TVMArrayCopyFromTo, but used function of NDArray. Like

tvm::runtime::NDArray output = getOutput_(0);
output.CopyTo(y_);

where output will be NDArray working with Metal context and y_ - NDArray allocated for CPU context