in MetalWorkspace::CopyDataFromTo three situation is handled:
- Copy from Metal to Metal
- Copy from CPU to Metal
- Copy from Metal to CPU
I.e. in your case one more extra copy will be done and it will be much easier currently just to pass NDArray working with CPU context.
Could you pass
m_cpuInput
toset_input("INPUT", m_cpuInput);
and verify the result?
Another difference - I looked in my code and figured out that I have not used TVMArrayCopyFromTo
, but used function of NDArray. Like
tvm::runtime::NDArray output = getOutput_(0);
output.CopyTo(y_);
where output will be NDArray working with Metal context and y_ - NDArray allocated for CPU context