Seeking Community Input: Embedded Device Support in Relax

a_j · October 1, 2025, 7:08pm

Hi TVM community,

We’ve been successfully using Relay-based TVM for ML compilation across our embedded DSP and MCU devices (including Cortex-M) for several years. The Relay flow has been excellent for our use cases, particularly leveraging features like the tvmc driver, C backend, unpacked API support, quantized operators, and USMP. We use TVM to compile/deploy a wide range of models, from 3 layer networks on our MCU devices to typical vision models on our DSP devices.

As we look toward migrating to Relax, we’ve identified that several features critical to our embedded workflows aren’t currently available. Features like unpacked API and USMP are essential for generating the low-footprint code our Cortex-M0 devices require.

We’re interested in contributing to enable or port some of these embedded-focused features from Relay to Relax, and would love the community’s perspective on:

Community interest: Would the TVM community welcome contributions focused on embedded device requirements?
Collaboration opportunities: Are there other embedded TVM users who might be interested in collaborating on re-enabling these capabilities in Relax?

We’re excited about the potential to help strengthen TVM’s embedded ecosystem and would appreciate any insights or interest from the community.

Thanks!

tqchen · October 3, 2025, 12:13am

Thanks for the note.

As of now indeed relax is focused on non-MCU usecases. If there is some interest in doing embedded, it is still a good usecase.

In the meantime, we also indeed would like to move away from monothlic piece in relay, where the usecase of multiple backends are intermixed together, ending making the core compiler less maintainable (which we don’t want to end up with). The core piece of main pipeline now focuses on the new FFI mechanism that may should work for most mobile-level systems, but indeed ffi call for each op may not work for MCU level. However, diverging too much from the FFI can also cause issues, since the standalone unpacked api path can be brittle, because that afterall is not as well specified as the overall ffi abi.

Just to note on the possible technical direction here :

Technically i think likely what is needed is to try to get something aggressively inlined via the partial graph AOT as noted in this post Unified Python First Compilation Flow through tvm.compile that means each individual tir fucntion get inlined and no longer needs to deal with the unpacked api issue. The interface still go with the packed API, but likely that is only single cost that can be paid, and with smart enough compiler and LTO that might get eliminated. Or take that AOT-fused relax graph fusion and take that into downstream that have good defined runtime.

The key philosophy of relax is to enable out of tree deployment and customization so such pipelines can be created on top and perhaps not have to be in-tree initially, so starting something that relax pipeline outputs, and bringing into something could be a good starting point.