missing of *4 - is it a problem of copy-past or you forgot to compensate the size of the float? since TVMArrayCopyFromBytes deal with bytes, not floats.
Another question why do you need m_gpuInput? you can use only NDArray been allocated with CPU context. Copy will happen automatically inside the TVM runtime.