Darknet yolov3 compilation fails for opencl mali in the latest TVM

TVMError: Cannot convert type float32x5 to OpenCL type tvm/src/target/source/codegen_opencl.cc line 162

The compilation works fine in TVM v0.7.0

but fails in TVM May 7, 2021 ( 254563a)

Loading weights from ./yolov3-tiny.weights...Done!
shape [1, 3, 416, 416]
Converting darknet to relay functions...
target: opencl -device=mali
target_host: llvm
Compile using relay...
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 3, 416, 416), 'float32'), ('TENSOR', (16, 3, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 16, 208, 208), 'float32'), ('TENSOR', (32, 16, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 32, 104, 104), 'float32'), ('TENSOR', (64, 32, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 64, 52, 52), 'float32'), ('TENSOR', (128, 64, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 128, 26, 26), 'float32'), ('TENSOR', (256, 128, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 256, 13, 13), 'float32'), ('TENSOR', (512, 256, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 512, 13, 13), 'float32'), ('TENSOR', (1024, 512, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 1024, 13, 13), 'float32'), ('TENSOR', (256, 1024, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 256, 13, 13), 'float32'), ('TENSOR', (128, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 384, 26, 26), 'float32'), ('TENSOR', (256, 384, 3, 3), 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 256, 26, 26), 'float32'), ('TENSOR', (255, 256, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=opencl -keys=mali,opencl,gpu -device=mali -max_num_threads=256, workload=('conv2d_nchw_spatial_pack.mali', ('TENSOR', (1, 512, 13, 13), 'float32'), ('TENSOR', (255, 512, 1, 1), 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):
  File "./compile.py", line 62, in <module>
    graph, lib, params = relay.build(mod, target=target, target_host=target_host, params=params)
  File "/root/workspace/tvm/python/tvm/relay/build_module.py", line 325, in build
    executor_config, runtime_mod, params = bld_mod.build(
  File "/root/workspace/tvm/python/tvm/relay/build_module.py", line 147, in build
    self._build(mod, target, target_host, executor)
  File "/root/workspace/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  28: TVMFuncCall
  27: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  26: tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::NDArray, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, tvm::runtime::NDArray> > > const&)
  25: tvm::build(tvm::runtime::Map<tvm::runtime::String, tvm::IRModule, void, void> const&, tvm::Target const&)
  24: tvm::build(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
  23: tvm::codegen::Build(tvm::IRModule, tvm::Target)
  22: tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::runtime::Module (*)(tvm::IRModule, tvm::Target)>(tvm::runtime::Module (*)(tvm::IRModule, tvm::Target), std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*) const
  21: tvm::codegen::BuildOpenCL(tvm::IRModule, tvm::Target)
  20: tvm::codegen::CodeGenC::AddFunction(tvm::tir::PrimFunc const&)
  19: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  18: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  17: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  16: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  15: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  14: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  13: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  12: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  11: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  10: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  9: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  8: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::AttrStmtNode const*)
  7: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  6: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::SeqStmtNode const*)
  5: tvm::tir::StmtFunctor<void (tvm::tir::Stmt const&)>::VisitStmt(tvm::tir::Stmt const&)
  4: tvm::codegen::CodeGenC::VisitStmt_(tvm::tir::StoreNode const*)
  3: tvm::codegen::CodeGenC::PrintExpr[abi:cxx11](tvm::PrimExpr const&)
  2: tvm::codegen::CodeGenC::PrintExpr(tvm::PrimExpr const&, std::ostream&)
  1: tvm::codegen::CodeGenOpenCL::VisitExpr_(tvm::tir::BroadcastNode const*, std::ostream&)
  0: tvm::codegen::CodeGenOpenCL::PrintType(tvm::runtime::DataType, std::ostream&)
  File "/root/workspace/tvm/src/target/source/codegen_opencl.cc", line 162
TVMError: Cannot convert type float32x5 to OpenCL type