hello!
I created a model that works on cpu.
cpu = graph_runtime.create(graph_cpu, lib_cpu, ctx_cpu)
cpu.set_input("data",Input_data)
cpu.set_input(**param_cpu)
and below code is part of cpu_device_api.cc
void CopyDataFromTo(const void* from,
size_t from_offset,
void* to,
size_t to_offset,
size_t size,
TVMContext ctx_from,
TVMContext ctx_to,
DLDataType type_hint,
TVMStreamHandle stream ) final
{
memcpy(static_cast<char*>(to) + to_offset,
static_cast<const char*>(from) + from_offset,
size);
}
this code looks like copy host data( np.array) to tvm data format( tvm.nd.array).
this make sense when use two device that has a deffierent context like cpu and gpu, But I think it’s inefficient to copy data twice in the same context.
But the function above looks like copy data twice.
Am I misunderstanding? Or really duplicate data twice in a CPU environment?