How to build model using cblas?

sleepwalker2017 · July 6, 2020, 9:26am

I set the target with ‘llvm -libs=cblas’, but the following warning is casted:

WARNING:autotvm:Cannot find config for target=llvm -libs=cblas, workload=(‘dense_cblas.x86’, (‘TENSOR’, (1, 128), ‘float32’), (‘TENSOR’, (64, 128), ‘float32’), None, 'float32 '). A fallback configuration is used, which may bring great performance regression.

Do I still need to auto-tune the model if I just use the cblas ？

comaniac · July 6, 2020, 4:53pm

No you don’t, although this message is definitely a bit confusing. We register library implementation as a one-config AutoTVM template so that we can make a comparison with other implementations. For example, dense_cblas.x86 takes 1e-3 ms while dense_pack.x86 takes 3e-3 ms. Then the op strategy will select dense_cblas.x86 when ApplyHistoryBest, and vice versa.

sleepwalker2017 · July 7, 2020, 8:38am

I tuned dense op and want to compare its performance with cblas.

when I compile it with target = ‘llvm -libs=cblas’ , I got this warning, and the performance is far worse than the tuned one(about 10 times slower). Is that normal? I’m not sure I got the right performance for cblas by setting the target with ‘llvm -libs=cblas’.

Do you have the benchmark of tvm auto tune and cblas ?

comaniac · July 7, 2020, 4:38pm

I don’t have the benchmark, but cblas should not perform 10x worse than TVM in dense ops. You may need to provide more information such as tuning/build script and the tuning log snippet for people to help investigate.

sleepwalker2017 · July 8, 2020, 9:26am

def select_implementation(op, attrs, inputs, out_type, target, use_autotvm=True):
all_impls = get_valid_implementations(op, attrs, inputs, out_type, target)
	
best_plevel_impl = None
for impl in all_impls:
    if best_plevel_impl is None or impl.plevel > best_plevel_impl.plevel:
        best_plevel_impl = impl  
if not use_autotvm:  
    outs = best_plevel_impl.compute(attrs, inputs, out_type)
    return best_plevel_impl, outs

outputs = {}
best_autotvm_impl = None
best_cfg = None
dispatch_ctx = autotvm.task.DispatchContext.current
for impl in all_impls:
    outs = impl.compute(attrs, inputs, out_type)  
    outputs[impl] = outs
    workload = autotvm.task.get_workload(outs)
    if workload is None:
        continue
    cfg = dispatch_ctx.query(target, workload)
    if cfg.is_fallback:
        # It's a fallback config
        continue
    if best_cfg is None or best_cfg.cost > cfg.cost:
        best_autotvm_impl = impl
        best_cfg = cfg
if best_autotvm_impl:	
    return best_autotvm_impl, outputs[best_autotvm_impl]
return best_plevel_impl, outputs[best_plevel_impl]

Hello，I found the code here，seems if the ~/.tvm/tophub/llvm_0.04.log is not empty, tvm will select the auto-tvm config if exists, even if I set the target with ‘llvm -libs=cblas’. So, if I want to test the cblas benchmark, I need to clear the tune log first. Is that the fact ?

comaniac · July 8, 2020, 5:30pm

AFAIK, tophub records should not include dense but only conv2d, so it should still use cBLAS for dense ops in your model.

sleepwalker2017 · July 9, 2020, 6:27am

OK, I got it. Thank you.