UMA qnn.conv2d replacement

I am trying to offload parts of the work to my custom accelerator. For that purpose, I’ve started using the BYOC UMA framework as it already bundles some of the best practices out there for this kind of stuff. Started from the Vanilla template. Already implemented a PoC plugging in a dummy C implementation for nn.conv2d. When I try to replace the qnn.conv2d op, I get an error though. RuntimeError: qnn.conv2d is currently only supported with Hexagon. Please run QNN Canonicalize pass to decompose this op into supported ops.

It seems this check was introduced in https://github.com/apache/tvm/pull/12398. I can probably get around this, by implementing some @qnn_conv2d_strategy.register-annotated qnn_conv2d_strategy in the same way it was done for Hexagon (or can’t I?). Just wondering if this is the way to go, since it’s probably not in line with the UMA way to go. Shouldn’t I be able to easily match on the qnn.conv2d ops and interject in the code generation in passes.py? Is the Hexagon RuntimeError a bug in the design?

If you are using BYOC approach, you need to offload qnn.conv2d or patterns with qnn.conv2d (e.g., qnn.conv2D + bias_add + requantize) to your codegen. Here is good “HowTo” doc about BYOC.

Default codegen nothing knows about QNN ops. That’s why we use QNN Canonicalization pass to lower QNN ops into sequence of other primitives. If you met “RuntimeError: qnn.conv2d is currently only supported with Hexagon. Please run QNN Canonicalize pass to decompose this op into supported ops”, it means most likely you disabled QNN canonicalization pass and your target is not Hexagon. You are right, with this PR compilation without this pass was enabled only for Hexagon.

But if you are using BYOC approach, you do not need to disable QNN canonicalization. It is very strange how can you get this error.

Not exactly using the fully fledged BYOC approach, I suppose UMA is built on top (and abstracts away most plumbing). Anyway, got rid of the Hexagon error by registering a strategy as (semi-)documented here: https://github.com/apache/tvm/blob/main/apps/uma/_template/strategies.py#L19-L33 I’m using the relay.op.strategy.generic.conv2d_strategy as a fallback (which is probably not entirely valid, but works for now, haven’t tested correctness). Anyway, this works with a simple IRModule, containing one simple qnn.conv2d, but fails whenever I load more complicated examples (like MobileNet from TFlite), which certainly has such ops. Getting a nasty C++ from Python from C++ from Python error.

  5: TVMFuncCall
  4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::tir::__mk_TVM0::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::tir::__mk_TVM0, tvm::runtime::TVMRetValue)
  3: tvm::tir::CreatePrimFunc(tvm::runtime::Array<tvm::te::Tensor, void> const&, std::optional<tvm::runtime::DataType>)
  2: tvm::tir::CreatePrimFuncWithConstants(tvm::runtime::Array<tvm::te::Tensor, void> const&, tvm::runtime::Array<tvm::runtime::NDArray, void> const&, std::optional<tvm::runtime::DataType>)
  1: tvm::tir::RewriteStageToBlock(tvm::te::Operation const&, tvm::tir::CreateFuncInfo*, tvm::runtime::Array<tvm::tir::Stmt, void>*, tvm::arith::Analyzer*)
  0: _ZN3tvm7runtime6deta
  File "../src/te/operation/create_primfunc.cc", line 494
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (info->IsArg(tensor)) is false: 
python-BaseException

This PR also looked at directly scheduling the qnn.conv2d operator.

2 Likes

I got a little bit further. It seems as if https://github.com/apache/tvm/pull/12447 has something to do with it.

I’m testing on https://github.com/mlcommons/tiny/raw/master/benchmark/training/visual_wake_words/trained_models/vww_96_int8.tflite If I set the params of tvm.relay.backend.contrib.uma.backend.UMABackend.partition to be that of the loaded tflite model, the error above appears. It seems some tensor is ‘forgotten’ in list(te_cached_func.inputs) + list(te_cached_func.outputs). If I do not set the params, there’s no problem.

Might prepare a minimally reproducible test case later.

@shoskens, would be great if you could share a mini example to reproduce.

@MJKlaiber test_vanilla_lower_qnn_conv2d.py · GitHub Maybe not very minimal haha, but it also includes the template code (not sure how to easily include from apps.uma._template.codegen). If you set a breakpoint @ https://github.com/apache/tvm/blob/902c2e2db70b75c36f9bd8c253707b1e8761cc18/python/tvm/relay/backend/contrib/uma/api/lower.py#L68 you’ll notice x contains 2 tensors, while it should contain 3 (using the old method).

@MJKlaiber related to this discussion, I saw from your tvmcon presentation that direct support for qnn is on the UMA roadmap. Is there already some ongoing development effort to make this happen? As I am pretty new to TVM, I would not feel comfortable to lead such a development because the overall TVM architecture is still somewhat unclear to me. However, I would be more than happy to be involved and actively contribute to such a development.

To my understanding, it seems like we are still missing generic qnn strategies to provide relay-to-tir lowering, prior to offloading these operation to an external kernel lib.

Hi @jwouters, yes we are still missing these. I propose to start a thread on that and add it to the RFC?

@cgerum @r.stahl @paulpb , any thoughts?

2 Likes