How could us use tvm.relay.transform.ToMixedPrecision?

Hi,

I need to inference my mdoel in fp16 mode, but I found no tutorial on this. I just found a function: tvm.relay.transform.ToMixedPrecision. Is this relevant to fp16 inference? How could we use it ?

This is a very new experimental feature so there’s no tutorial yet. You can refer to the unit tests in this PR for use cases: https://github.com/apache/tvm/pull/8069

cc @AndrewZhaoLuo

Hi,

thanks for replying !! I come from github at this pr.

My code like this:

def compile_model(mod, params, target, logfile, save_path):
    tvm.relay.backend.compile_engine.get().clear()
    mod = tvm.relay.transform.ToMixedPrecision(
            mixed_precision_type='float16')(mod)
    with tvm.autotvm.apply_history_best(logfile):
        with tvm.transform.PassContext(opt_level=3):
            lib = tvm.relay.build(mod, target=target, params=params)
    lib.export_library(save_path) # 保存编译好的模型, 必须so结尾,不然c++不识别

And I have the error message:

Traceback (most recent call last):
  File "main.py", line 207, in <module>
    args.save_path)
  File "main.py", line 122, in compile_model
    mixed_precision_type='float16')(mod)
  File "/root/build/tvm/python/tvm/ir/transform.py", line 161, in __call__
    return _ffi_transform_api.RunPass(self, mod)
  File "/root/build/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  23: TVMFuncCall
  22: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}>(tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  21: tvm::transform::Pass::operator()(tvm::IRModule) const
  20: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  19: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  18: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::TypedPackedFunc<tvm::relay::Function (tvm::relay::Function, tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::relay::transform::ToMixedPrecision(tvm::runtime::DataType, int)::{lambda(tvm::relay::Function, tvm::IRModule, tvm::transform::PassContext)#1}>(tvm::relay::transform::ToMixedPrecision(tvm::runtime::DataType, int)::{lambda(tvm::relay::Function, tvm::IRModule, tvm::transform::PassContext)#1})::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  17: tvm::relay::ToMixedPrecision(tvm::RelayExpr const&, tvm::runtime::DataType const&, int)
  16: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
  15: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
  14: _ZN3tvm5relay16MixedModeMutator17DispatchVisitExprERKNS_9Re
  13: tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  12: tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  11: _ZZN3tvm5relay11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlR
  10: tvm::relay::MixedPrecisionPass::VisitExpr_(tvm::relay::FunctionNode const*)
  9: tvm::relay::ExprMutator::VisitExpr_(tvm::relay::FunctionNode const*)
  8: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
  7: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
  6: _ZN3tvm5relay16MixedModeMutator17DispatchVisitExprERKNS_9Re
  5: tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  4: tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  3: _ZZN3tvm5relay11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlR
  2: tvm::relay::MixedModeMutator::VisitExpr_(tvm::relay::CallNode const*)
  1: tvm::relay::MixedPrecisionPass::Rewrite_(tvm::relay::CallNode const*, tvm::RelayExpr const&)
  0: tvm::Op::GetAttrMapContainer(tvm::runtime::String const&)
  File "/root/build/tvm/src/ir/../node/attr_registry.h", line 146
TVMError: Attribute 'FTVMMixedPrecisionConversionType' is not registered

I built tvm from source. commit id is: 972d7b52d2dbdd7cc1db98c3af04b04c4fc31b75.

How could I make it work please? @comaniac

As I mentioned, 99% of the error like TVMError: Attribute 'XXX' is not registered is due to the mismatch between TVM Python code and C++ shared library. You need to re-run cmake and rebuild TVM to make sure everything is up to date.

I still cannot make it work. This time, I used conda approach to install. By first: conda build --output-folder=conda/pkg conda/recipe, and then conda install tvm -c ./conda/pkg. This takes longer time than build with cmake. But I still have that error message. Would you please tell me can it be other reasons, and why other frequently-used class or functions (such as relay.build) works well ?

Sorry it turns out that this is an issue in the implementation. You need to explicitly import the mixed precision to register the attributes:

import tvm
from tvm import relay
from tvm.relay.transform import mixed_precision # Register op attribute
...
mod = tvm.relay.transform.ToMixedPrecision(mixed_precision_type='float16')(mod)

@AndrewZhaoLuo we should import this file in __init__.py so that it can be imported along with import tvm.relay.

1 Like

@coincheung Yes, sorry this is a bit of

You need a bit of a magic line to register the op attributes that are missing in the error you show:

from tvm.relay.transform import mixed_precision

Sorry about that. I will make the changes @comaniac suggest:

2 Likes

Thanks a lot, it works now.

Hi,

I tested fp16 mode on my platform, but I got unexpected performance.

I am using t4 gpu, and I exported my model from pytorch to onnx, and use tvm onnx frontend to load the model. Then I tuned my model on t4 gpu, with default configuration(xgboost tuner, 2000 trials).

After tuning, I compiled the model into two libs. One is with mod = ToMixedPrecision('float16')(mod), as above mentioned, and the other is without that line. The size of saved fp16 lib is about 27M, while the size of fp32 lib is 182M. However, when I run inference on these two libs for 1000 times on my gpu, the fp32 lib used 38s, but the fp16 lib used 110s.

Could you tell me is that normal for tvm? Shouldn’t fp16 mode runs faster than fp32?

Or do I need to add some options when I tuned my model to make fp16 faster than fp32 ?

Yeah an increase in performance, especially for a tensor core gpu should be expected. Debugging why this isn’t the case is a bit harder. If you send me the model and script you are using I might have time to take a look.

Here are the current benchmarked models I have done: FP16 Improvements - Google Sheets

Furthermore you should be running other optimizations like constant folding to remove many cast operations on weights: TVM-Sandbox/benchmark_fp16.py at 3e949e86a8c2cd2a7af10d412243ef4f47da1109 · AndrewZhaoLuo/TVM-Sandbox · GitHub

I’m guessing you might not be doing that.

Thanks for telling this, I will add this and have a try. Would you please tell me what is the meanings of run_other_opts ?

Hi,

After adding the graph_optimization function , I found that I can make it work for cuda, but not make it work on llvm fp16 mode.

I load onnx model, compile it and fun inference with c++ api. A simplified piece of code is uploaded here: tvm_debug.zip - Google Drive

Juts run the script sh run.sh, would see the error message. Would you please spent some time helping me figure out the problem? Thanks a lot.

It’s just shorthand for “run other optimizations”. When benchmarking or tuning I always run these. It’s free performance.

1 Like

Didn’t run your script but I believe the main problem isn’t with the mixed precision pass.

Even the FP32 model cannot run. This is the error I get:

E                   at /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/type_traits:3545
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
E                   at /Users/andrewzhaoluo/Desktop/dev_tvm/tvm/src/runtime/c_runtime_api.cc:525
E               rv = local_pyfunc(*pyargs)
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/relay/op/strategy/generic.py", line 243, in _compute_conv2d
E               return [topi_compute(*args)]
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/topi/x86/conv2d.py", line 129, in conv2d_nchw
E               packed_out = conv2d_NCHWc(data, kernel, strides, padding, dilation, layout, layout, out_dtype)
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/topi_integration.py", line 165, in wrapper
E               node = topi_compute(cfg, *args)
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/topi/x86/conv2d.py", line 196, in conv2d_NCHWc
E               cfg.define_split(
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/space.py", line 730, in define_split
E               return self._add_new_transform(SplitSpace, name, axes, policy, **kwargs)
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/space.py", line 829, in _add_new_transform
E               axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/space.py", line 829, in <listcomp>
E               axes = [x if isinstance(x, (VirtualAxis, Axis)) else self.axis(x) for x in axes]
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/space.py", line 687, in axis
E               return VirtualAxis(var)
E             File "/Users/andrewzhaoluo/Desktop/dev_tvm/tvm/python/tvm/autotvm/task/space.py", line 141, in __init__
E               raise RuntimeError("Invalid type of axis: " + str(type(var)))
E           RuntimeError: Invalid type of axis: <class 'tvm.tir.expr.Add'>

../tvm/python/tvm/_ffi/_ctypes/packed_func.py:237: TVMError

I believe this is because the conv2d compute definition for llvm does not support dynamic shaped inputs. You can make a github issue for this.

Hi,

Sorry that I did not describe my problem in detail.

I did not met problem of auto-tune(maybe because I used opt_level=3, I am not sure).

I got error message like this:


./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee
./segment: symbol lookup error: ./saved_model_v1_llvm_fp16.so: undefined symbol: __gnu_f2h_ieee

It happens when I load the llvm_fp16.so file with cpp api. I did not met this problem, when I load llvm_fp32.so, cuda_fp16.so or cuda_fp32.so file. What is the reason of this please?

Try upgrading your version of llvm to the latest? Which version are you using?

It seems like this diff for llvm should have implemented things: https://reviews.llvm.org/rGf1ac850e7721a79238e213f2e6624bf8a2f3a7bd

Unfortunately I don’t have a lot of ideas beyond google searching.

Hi,

I am using llvm installed from apt of ubuntu 18.04, its version is llvm6.0:

By the way, I am running my program from docker container: nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04. I think version 6.0 is higher than the version required in the docs, which is 4.0.

Would you please tell, which version are you using when you benchmarks the model?

I updated my llvm with the following command:

 bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)

But the problem still exists.

Did you also rebuild tvm with the new llvm? It sounds like a linking problem of some sort now. 6.0 should be enough though so I am not sure what happened.

1 Like

Yes, I rebuilt tvm from source, and also rebuilt my cpp code, though from result of ldd I observe that my cpp code does not rely on llvm.

Though the problem is still there, the number of error message lines is reduced from 5 to 3. Maybe I am nearer to the solution :slight_smile:

1 Like