Irrespective of input same output iOS TVM model

Hi TVM Team,

@tqchen @preethireddy @shivmgg @echuraev I am developing an iOS TVM application but Irrespective of input same inference output I am getting in the iOS TVM model.

  1. Converted TensorFlow model to TVM model - success
  2. Tried to inference converted x86 TVM model in python - success
  3. Again converted TF model to TVM model of iOS ARM64 - success
  4. Tried to inference converted arm64 model in iOS - Irrespective of input same output not giving expected inference output.

Below are the target and target_host: target = ‘metal’ target_host = “llvm -mtriple=arm64-apple-darwin”

Note: If I set all input values to zero it is giving the same inference output.

What will be the issue kindly help me to fix this issue.

Thank you.

Ho do you export model? Do you point iphone sdk like

from tvm.contrib import xcode
arch = "arm64"
sdk = "iphoneos"
libpath = model_folder_path + "/" + model_name + ".dylib"
lib.export_library(libpath, xcode.create_dylib, arch=arch, sdk=sdk)

Tried to inference converted arm64 model in iOS - Irrespective of input same output not giving expected inference output. Do you do this through RPC or build in TVM/model to your application? If you do natively and run your own app on iOS, how do you feed data? Could you share this part of code?

Hi @elvin-n i have built a standalone iOS application

For iOS CPU.

  • sdk = “iphoneos”
  • target = “llvm -mtriple=arm64-apple-darwin”

For iOS Metal:

  • sdk = “iphoneos”
  • target = “metal”
  • target_host = “llvm -mtriple=arm64-apple-darwin”

And i used c++ to inference the TVM model

// load module tvm::runtime::Module mod_syslib = tvm::runtime::Module::LoadFromFile(std_resourcePath);

//load graph
std::ifstream json_in(std_jsonPath);
std::string json_data((std::istreambuf_iterator<char>(json_in)), std::istreambuf_iterator<char>());
json_in.close();

// get global function module for graph runtime
auto tvm_graph_runtime_create = tvm::runtime::Registry::Get("tvm.graph_executor.create");
tvm::runtime::Module mod = (*tvm_graph_runtime_create)(json_data, mod_syslib, device_type, device_id);
this->m_handle = new tvm::runtime::Module(mod);

//parameters needs to be TVMByteArray typr to indecate the binary data
std::ifstream params_in(std_paramPath, std::ios::binary);
std::string params_data((std::istreambuf_iterator<char>(params_in)), std::istreambuf_iterator<char>());
params_in.close();
TVMByteArray params;
params.data = params_data.c_str();
params.size = params_data.length();
mod.GetFunction("load_params")(params);

tvm::runtime::PackedFunc set_input = mod->GetFunction("set_input");
set_input("input", m_gpuInput);

tvm::runtime::PackedFunc run = mod->GetFunction("run");
run();
tvm::runtime::PackedFunc get_output = mod->GetFunction("get_output");

which context do you use for creation of m_gpuInput and how do you populate data?

BTW, you are loading params and json that means that you are using packed zip format for storing model. Is it more convinient for you? Does it find artefacts during the loading of model? I prefer to use everything been packed to one library created by lib.export_library

m_gpuInput is in metal context kDLMetal

and I bundled JSON, Params, and dylib separately with the application I feel it is more convenient to understand if any issue.

How do you populate data to this NDArray?

//data is float* 
TVMArrayCopyFromBytes(m_cpuInput, data, w*h*c);
TVMArrayCopyFromTo(m_cpuInput, m_gpuInput, nullptr);
set_input("INPUT", m_gpuInput);

get_output(0, m_gpuOutput0);
TVMArrayCopyFromTo(m_gpuOutput0, m_cpuOutput0, nullptr);

missing of *4 - is it a problem of copy-past or you forgot to compensate the size of the float? since TVMArrayCopyFromBytes deal with bytes, not floats.

Another question why do you need m_gpuInput? you can use only NDArray been allocated with CPU context. Copy will happen automatically inside the TVM runtime.

  • Yes, it is WHC*4.
  • And my understanding is we can’t directly access GPU memory so i copied all my input data to CPU memory and then copied to GPU memory please excuse me if my understanding was wrong.

in MetalWorkspace::CopyDataFromTo three situation is handled:

  1. Copy from Metal to Metal
  2. Copy from CPU to Metal
  3. Copy from Metal to CPU I.e. in your case one more extra copy will be done and it will be much easier currently just to pass NDArray working with CPU context. Could you pass m_cpuInput to set_input("INPUT", m_cpuInput); and verify the result?

Another difference - I looked in my code and figured out that I have not used TVMArrayCopyFromTo, but used function of NDArray. Like

tvm::runtime::NDArray output = getOutput_(0);
output.CopyTo(y_);

where output will be NDArray working with Metal context and y_ - NDArray allocated for CPU context

@myproject24 did you have a chance to verify if passing of CPU NDArray to input and using of the other API for copy data helps to solve the problem or not?

Sorry i didn’t get time to check i will let you know once i verify it.