[RFC] UMA: Universal Modular Accelerator Interface

Here's a sample code:

mod = tvmc.load(r"/shared/model.tflite")

uma_backend = VanillaAcceleratorBackend()
mod = uma_backend.partition(mod)
target = tvm.target.Target("vanilla_accelerator", host=tvm.target.Target("c"))

package = tvmc.compile(model, target=target)
result = tvmc.run(package, device=device)

Got the following error:

Traceback (most recent call last): File “/shared/run_custom.py”, line 107, in main() File “/shared/run_custom.py”, line 76, in main mod = uma_backend.partition(mod) File “/usr/uma/python/tvm/relay/backend/contrib/uma/backend.py”, line 299, in partition return self._relay_to_relay.partition(mod, params) File “/usr/uma/python/tvm/relay/backend/contrib/uma/api/partitioner.py”, line 96, in partition mod = relay.transform.InferType()(mod) File “/usr/uma/python/tvm/ir/transform.py”, line 161, in call return _ffi_transform_api.RunPass(self, mod) File “/usr/uma/python/tvm/_ffi/_ctypes/packed_func.py”, line 223, in call values, tcodes, num_args = _make_tvm_args(args, temp_args) File “/usr/uma/python/tvm/_ffi/_ctypes/packed_func.py”, line 188, in _make_tvm_args raise TypeError(“Don’t know how to handle type %s” % type(arg)) TypeError: Don’t know how to handle type <class ‘tvm.driver.tvmc.model.TVMCModel’>

I modified the code and loaded the TFLite model as done in the TVM from_tflite.py example. Then replaced the generation of “mod” in create_conv2d() in the run.py example Now getting another error. It seems that vanilla accelerator is not recognized by the scheduler

1: tvm::relay::OpImplementation::Schedule(tvm::Attrs const&, tvm::runtime::Array<tvm::te::Tensor, void> const&, tvm::Target const&) 0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) [clone .cold] File “/usr/uma/python/tvm/_ffi/_ctypes/packed_func.py”, line 81, in cfun rv = local_pyfunc(*pyargs) File “/usr/uma/python/tvm/relay/op/strategy/generic.py”, line 114, in schedule_reduce return topi.generic.schedule_reduce(outs) File “/usr/uma/python/tvm/topi/generic/nn.py”, line 597, in schedule_reduce return _default_schedule(outs, True) File “/usr/uma/python/tvm/topi/generic/default.py”, line 28, in default_schedule raise RuntimeError(“schedule not registered for ‘%s’” % target) RuntimeError: schedule not registered for 'vanilla_accelerator’

We’ll shortly provide an example to import an NN from an onnx/tflite file

Just added the things we discussed about UMA into a branch of the UMA RFC:

  • TVMC integration
  • Mock-accelerators for tutorial

Additional things are in early phase and are intended to enable an early discussion by the people interested in contributing or helping to form UMA

Feel free to comment here or in the PR:

CC: @areusch @SebastianBoblestETAS @aca88 @manupa-arm @cgerum @paulpb @PhilippvK @r.stahl @UlrikHjort @kslavka

1 Like

@MJKlaiber @areusch I’ve ran the latest uma test pipeline on a custom tflite model and would like to raise one issue. I checked out the latest TVM on main branch (SHA1 038f15b5e204120709186a8791e5b49986060bb0). Then ran tvm/tests/python/contrib/test_uma/test_uma_pipeline.py UMA successfully generated .c code, and here is the issue: The c code for convolution implementation is repeated for each convolution function.

e.g. tvmgen_default_vanilla_accelerator_main_0, tvmgen_default_vanilla_accelerator_main_1, … tvmgen_default_vanilla_accelerator_main_k

These functions contain the same convolution implementation code. Based on Michael’s RFC I assumed there would be multiple calls to the Vanila my_ai_hw_conv2dnchw() function with the relevant kernel and input sizes.

Please let me know what you think, whether this is the way TVM is built, or maybe I did some mistake in my setup. How can UMA generate a .c code that will call my custom convolution implementation (function call and not a duplicated c code)?

Thanks, Slava.

Hello, UMA is a great interface for custom accelerator vendors, which alleviates BYOC process a lot.

I’m building a workflow from a pre-trained model to the compiled c source for a backend (ARM core + custom accelerator). As our accelerator supports only int8/int16 operands, so I took a quantized onnx model (int8) into the frontend. From the relay graph, I see the pattern of interest would be “qnn.conv2d”. The


was successful, but I met some errors by creating the PrimFunc. I’m not sure if you could provide any example for the quantized operator, since as far as I know, many custom accelerators are working in the low precision integer domain. Such an example definitely makes sense.

To register the operator strategy for “qnn.conv2d”. I used

wrap_compute_conv2d(topi.arm_cpu.conv2d_nchw_int8), wrap_topi_schedule(topi.arm_cpu.schedule_conv2d_nchw_int8),

But I’m not sure if this is the correct way.

I appreciate any hints from you. Chen

Hello Chen, Yes quantized operators are not directly lowerable to TIR, there are a few possibilities to handle this.

  1. Your approach is somewhat feasible, but it has the problem, that you are most likely ignoring the zero point / scale of your computation. If your hardware accelerator only supports a single value for scale and zero_point it might still be usable as is.
  2. Your approach can be augmented by adding the quantization parameters as attributes to the TE, for i need to refer you have to the ethos-u backend for examples, i hope i can provide a full fledged example shortly.
  3. You can run relay.qnn.transform.CanonicalizeOps() as a pre- or post-partitioning pass. In this case you do not need to register a custom operator strategy, but the generated TIR is much more complicated.

I personally would use option 2 at the moment.