"Run" function gives a crash on tuned iOS model

Hello everyone!

I have successfully built metal model for iOS and I have verified that it’s working correctly so I have focused on tuning it.

Thanks to the help of magnificent @apeskov, I was able to produce a fully tuned iOS model. As I can see from the logs, 14 hours of tuning gave a very promising results and I’m eager to see this in my iOS app.

But, when I try to deploy and run the model in my app, I get a crash on “run” function. Here’s where I store run function:

    Device dev {kDLMetal, 0};

    m_mod = tvm::runtime::Module::LoadFromFile([k_ModelName UTF8String]);

    Module def = m_mod.GetFunction("default")(dev);
    m_inputFunc = def.GetFunction("set_input");
    m_runFunc = def.GetFunction("run");
    m_outputFunc = def.GetFunction("get_output");

Here’s where I use it:

    m_inputFunc("input", m_input);
    m_runFunc();   //that's where the code crashes
    m_outputFunc(1, m_output);

By the way, both input and output functions are working (or at least they don’t crash the application). Also I have experienced such an issue before by using untuned model, but it got fixed after rebuilding. I have no idea what I did wrong back then, but in that case rebuilding the model with tuned config over and over again doesn’t fix an issue :frowning: Any advise would be appreciated! Thank you!

IMPORTANT NOTE: The initial version of the config (log_file) that auto_scheduler has created can not be used in ApplyHistoryBest and it gives fatal error, that "input" is invalid key. After a quick look I have found out that my config starts with this chunk of data:

{"input": ["llvm -keys=cpu -link-params=0 -mtriple=arm64-apple-darwin", "conv2d_NCHWc.x86", [["TENSOR", [1, 209, 64, 64], "float32"], ["TENSOR", [256, 209, 1, 1], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 166, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 19]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 4]], ["tile_oh", "ot", 1]]}, "result": [[0.00832551045], 0, 2.187854290008545, 1622414478.4734201], "version": 0.2, "tvm_version": "0.8.dev0"}
{"input": ["llvm -keys=cpu -link-params=0 -mtriple=arm64-apple-darwin", "conv2d_NCHWc.x86", [["TENSOR", [1, 64, 4, 4], "float32"], ["TENSOR", [64, 64, 3, 3], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 127, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 2]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 4]], ["unroll_kw", "ot", true]]}, "result": [[4.183946308163834e-05], 0, 1.418644666671753, 1622419079.2600079], "version": 0.2, "tvm_version": "0.8.dev0"}
{"input": ["llvm -keys=cpu -link-params=0 -mtriple=arm64-apple-darwin", "conv2d_NCHWc.x86", [["TENSOR", [1, 128, 4, 4], "float32"], ["TENSOR", [64, 128, 3, 3], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 312, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 4]], ["unroll_kw", "ot", false]]}, "result": [[7.26768600301659e-05], 0, 1.803436040878296, 1622421260.50744], "version": 0.2, "tvm_version": "0.8.dev0"}
{"input": ["llvm -keys=cpu -link-params=0 -mtriple=arm64-apple-darwin", "conv2d_NCHWc.x86", [["TENSOR", [1, 256, 4, 4], "float32"], ["TENSOR", [128, 256, 3, 3], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 396, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 4]], ["unroll_kw", "ot", false]]}, "result": [[0.0002428782051282051], 0, 1.4385228157043457, 1622422703.4950469], "version": 0.2, "tvm_version": "0.8.dev0"}
...........

I didn’t find any attachment to the input key in config reading algorithms so I have deleted it. That helped me to overcome a crash from ApplyHistoryBest and build the model. But now I’m wondering - could I erase some important info from the log_file and because of that I have a current issue?

which context do yo use for creation of m_input tensor? it should be created with CPU context. Copying of data to GPU will happen automatically during call of set_input.

Another proposal if above don’t help - can you build debug runtime set(USE_GRAPH_RUNTIME_DEBUG ON) in config.cmake and call execute_node instead of run. This function accepts the only argument - index of the latest node to execute. If you point execute_node(10) it will execute all layers from 0 to 10th. Such way you can try to localize the problematic place

Hello @elvin-n ! Thank you very much for your help!

Here’s the full code where I cache reusable variables:

Device dev {kDLMetal, 0};

m_mod = tvm::runtime::Module::LoadFromFile([k_ModelName UTF8String]);

Module def = m_mod.GetFunction("default")(dev);
m_inputFunc = def.GetFunction("set_input");
m_runFunc = def.GetFunction("run");
m_outputFunc = def.GetFunction("get_output");

m_input = NDArray::Empty({1, 3, 256, 256}, DLDataType{kDLFloat, 32, 1},  dev);
m_output = NDArray::Empty({1, 209, 64, 64}, DLDataType{kDLFloat, 32, 1}, dev);

And here’s the code where I run the inference:

size_t s = blob.total() * blob.elemSize();
m_input.CopyFromBytes(blob.data, s);

m_inputFunc("input", m_input);
m_runFunc();
m_outputFunc(1, m_output);

outputTensor = OutputTensor(m_output->shape[0], m_output->shape[1], m_output->shape[2], m_output->shape[3]);
m_output.CopyToBytes(outputTensor.data(), m_output->shape[0] * m_output->shape[1] * m_output->shape[2] * m_output->shape[3] * sizeof(float));

When talking about context - are you referring to Device entity?

Exactly. Instead of

Device dev {kDLMetal, 0};
...
m_input = NDArray::Empty({1, 3, 256, 256}, DLDataType{kDLFloat, 32, 1},  dev);
m_output = NDArray::Empty({1, 209, 64, 64}, DLDataType{kDLFloat, 32, 1}, dev);

just create new one on CPU for tensors:

Device dev {kDLMetal, 0};
// work with network, loading with this context, 
...
DLDevice cpuDevice{kDLCPU, 0};
m_input = NDArray::Empty({1, 3, 256, 256}, DLDataType{kDLFloat, 32, 1},  cpuDevice);
m_output = NDArray::Empty({1, 209, 64, 64}, DLDataType{kDLFloat, 32, 1}, cpuDevice);

Hello @elvin-n !

I have investigated an issue and I have kind of weird results. After some branches switching and rebuilding the whole TVM from scratch my project has started compiling dylib correctly, meaning that run function works as expected. Once I “caught” this TVM state, I have forked it to my branch just to make sure I won’t loose this correct behaviour.

Today I needed to switch to v0.7 branch in order to build tvm_runtime with Release option (because I can not do that in main branch or in my branch due to errors). After I have come back to my branch where I have compiled model dylib successfully before, I have encountered the issue once again - right now I can not compile dylib.

I do not see any changes in the repo, and I suspect that there’re some files included in gitignore that affects the outcome. But I don’t really know where to pay my attention to.

Could you give me some advice on how to approach this issue and understand roots of it? Thank you very much!

IMPORTANT UPDATE: For the time being I can not compile the model with Metal backend. Model with CPU backend compiles and works just fine

Do you mean that you cannot compile model? What are the errors?

Just verified on the latest main branch - several models were compiled successfully on metal. Have you cleaned all artifacts in build directory after the build of tvm 0.7?

Hello @elvin-n! I’m sorry for the confusion. What I meant is that model compiles just fine, but it seems that it’s compiled with CPU backend although I’m specifying “metal” target. A similar issue is described here (TVM fails to compile Metal lib on MacOS due to version?).

But I see your suggestion and I’m going to try it now and I’ll let you know if it works

Short summary of issue investigation: tvm runtime cannot be compiled for iOS on current revision due to two problems

  1. wrong options in CMAKE file - fix Fix compilation of tvm runtime for iOS by elvin-n · Pull Request #8242 · apache/tvm · GitHub
  2. bitcode cannot be added to tvm_runtime - under investigation, currently suggestion to compile without bitcode for development purposes

I’m going to leave the conclusion message as well for users who only start working with TVM :slight_smile:

The problem was not in the model, but in iOS TVM runtime library. Please recompile iOS TVM Runtime library with changes @elvin-n provides.

Hi, can bitcode be added to tvm_runtime now?