I’m reading the tutorial for auto-tuning and am interested in combining it with compiling deep learning models. It looks like that we can get net and params from various nnvm frontends and then do the auto-tuning as in the tutorial at ease. So I tried a coreml squeezenet 1.1 example (converted from caffe by coremltools; caffe model: https://github.com/DeepScale/SqueezeNet). It can be compiled to nnvm successfully, as the from_coreml tutorial. However, if I change
Traceback (most recent call last):
File "tune_nnvm_cuda.py", line 262, in <module>
tune_and_evaluate(tuning_option)
File "tune_nnvm_cuda.py", line 227, in tune_and_evaluate
symbols=(nnvm.sym.conv2d,))
File "/tvm/python/tvm/autotvm/task/nnvm_integration.py", line 248, in extract_from_graph
nnvm.compiler.build(graph, target=tracing_target, shape=shape, dtype=dtype)
File "/tvm/nnvm/python/nnvm/compiler/build_module.py", line 305, in build
graph = graph.apply("GraphCompile")
File "/tvm/nnvm/python/nnvm/graph.py", line 234, in apply
check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
File "/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call
raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: [21:51:25] /tvm/nnvm/src/compiler/compile_engine.cc:212: Check failed: out[i].ndim() == out_info[i].ndim() (4 vs. 0) broadcast_add
I don’t understand, if regular compiling and auto-tuning both call nnvm.compiler.build, but with different targets, why does regular compiling succeed and auto-tuning fail? Could anyone explain it? Thanks.
P.S: I’m not interested in squeezenet 1.1 tuned parameters, just take it as an example because it’s small. I’m interested in compiling and auto-tuning a general deep learning model.
I’ve seen the same error in a different context (when I was rolling my own winograd integration into nnvm). If I remember correctly, the problem is this loop being not executed at all, i.e. shape_vec is corrupted. I have no idea why this is happening with tracing target.
If I cannot auto-tune the entire model, can I tune each individual con2d?
The auto-tune examples generate logs. Should I append these logs to ~/.tvm/tophub/cuda_v0.02.log? Should I delete the default log and replace it with the generated log?
yes, in principle you can auto-tune individual layer manually. But that would be very tedious so I won’t recommend doing that. The issues that only appear with “tracing” target happened in elsewhere as well. Maybe we should look into what is going on.
Also, for the second case, it compiles to nnvm at default opt_level=2. However, if I change opt_level to 3, I got this error below. Not sure if this is related. SqueezeNet can compile to nnvm at opt_level=3 though.
Traceback (most recent call last):
File "from_coreml.py", line 79, in <module>
graph, lib, params = nnvm.compiler.build(sym, target, shape_dict, params=params)
File "/tvm/nnvm/python/nnvm/compiler/build_module.py", line 292, in build
graph = graph.apply("InferShape")
File "/tvm/nnvm/python/nnvm/graph.py", line 234, in apply
check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
File "/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call
raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: Error in operator conv2d1: [21:44:02] /tvm/nnvm/src/top/nn/convolution.cc:65: Check failed: dshape.ndim() == 4U (5 vs. 4) Input data should be 4D
Stack trace returned 10 entries:
[bt] (0) /tvm/build/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x5a) [0x7f5603b1b5aa]
[bt] (1) /tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f5603b1c158]
[bt] (2) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::top::Conv2DInferShape(nnvm::NodeAttrs const&, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*, std::vector<nnvm::TShape, std::allocator<nnvm::TShape> >*)+0x7d9) [0x7f55ff66fe99]
[bt] (3) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(+0x130b81) [0x7f55ff5aab81]
[bt] (4) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(+0x131eaa) [0x7f55ff5abeaa]
[bt] (5) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(+0x132d96) [0x7f55ff5acd96]
[bt] (6) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::ApplyPasses(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x32b) [0x7f55ff569e9b]
[bt] (7) /tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(NNGraphApplyPasses+0x348) [0x7f55ff552f28]
[bt] (8) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f5644b15e40]
[bt] (9) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x2eb) [0x7f5644b158ab]
We shouldn’t pass params into it. We should infer the shape even if we don’t have params. I have met this issue several times, every time I find is something else wrong, but not related with the params. And in our environment, I find if we pass params, we can not train using multi cpu cores.
In terms of the multi-cpu issue, it is because the thread pool in tvm is incompatible with python’s multiprocessing package.
After executing a tvm function, we cannot use multiprocessing in python anymore.
If you pass params, then nnvm will run a tvm function to transform params, which breaks python multiprocessing.
My solution is launching a new python thread (thread is ok. Don’t need processing) to run task extraction. This separates the environment and it works. Or you can pickle the tasks by one script and tune them by another script.
For the problem in this thread, I agree with you. Passing params can be a quick fix, but there must be something wrong in other places.
Ideally, both cases should work. But there is something wrong with model converters or nnvm compiler. I do not have the plan to look into it.
So currently you can pass params for your models. As I mentioned before, you also have to use another thread (or another script) to do task extraction to avoid the multiprocessing issue.