Phasing out Legacy Components

tqchen · September 20, 2024, 4:25pm

Here are some considerations that we can take together regarding the front end.

Frontend of interest evolve over time. For example, latest PyTorch frontend migrates to the FX graph, fx and inductor, and the respective frontend needs to be updated accordingly. For new frontend needs. Bringing them to relax would enable a clear focus here. It also unblocks the issue of dynamic shape. So we can have focused effort around these efforts. That is why such conversation is important, so we can enabel the focus.

We can possibly is keep certain importer modules and data structures a bit longer if there is community volunteer effort maintaining them. We need to address the testing issue by moving from execution tests to structural tests and placing execution tests nightly, where the model gets imported and then translated to Relax for structural testing. We encourage such efforts to actually start work on frontend translation directly into Relax when possible.

Coming back to the broader context, this is indeed a hard tradeoff we need to make. As the real impact translates to the volunteer developers, and we can face a real risk being burdened by lack-maintainace, slow development and the project not survive in the fast competitive landscape. That is why it is important to bring this conversation and move toward the direction. That would also enable to have a clear call to focus on some of the latest frontend needs through relax development. Love to see ideas around them and working together on some of the directions!

To keep things continue supported, we should enable release branches cut that can continue to take maintenance patches on the related components. We can also account for them in community development and contributions.

astevens-infineon · September 24, 2024, 12:54am

Here at Infineon one of the key “deciders” for TVM was the availability of the mature (relay-based) backends for ARM embedded HW (and other COTS targets). Reading between the lines (“release branch, maintenance patch” ) it seems that these will effectively be orphaned. No access to new frontends, little or no scope for active enhancement/extension PRs, loss of connection to the TVM community mainstream for anyone still working with them…

Is there any likelihood of these being ported to Relax (ideally by their contributors)?

Without these TVM would become something of a “non starter” for our productive use. Dependable and properly maintainable backends for the mainstream ARM compute IP is the “must have”. For our own in-house HW we’d just have to grit our teeth, write off our TVM investment, and suffer through hacking-up TFLM/PyTorch edge with in-house performance hacks

tqchen · September 24, 2024, 2:42am

Let us look into some of the frontend needs. One thing that we can do is to align most of the relax, relay ops, so we can try to use GenAI tools to bring some of the relay frontend to relax.

LeiWang1999 · October 4, 2024, 4:02pm

I’m currently working on refactoring our project of the methodology we discussed in this thread, using TVM core infrastructure by utilizing tvm with include dependencies and link tvm with shared libraries.

Example of a CMakeList.txt that works with tvm.

cmake_minimum_required(VERSION 3.21)
project(TileLang C CXX)

set(CMAKE_CXX_STANDARD 17)

# Define TVM root directory
set(TVM_ROOT ${PROJECT_SOURCE_DIR}/3rdparty/tvm)

# Include directories
include_directories(
    ${PROJECT_SOURCE_DIR}/include
)

# Source files for the project
file(GLOB_RECURSE TileLang_SOURCES
    ${PROJECT_SOURCE_DIR}/src/transform/*.cpp
    ${PROJECT_SOURCE_DIR}/src/op/*.cc
    ${PROJECT_SOURCE_DIR}/src/codegen/*.cc
)

# Create shared library
add_library(TileLang SHARED ${TileLang_SOURCES})

I think the key part of this pipeline is, ensuring that the tvm based implementation allows developers write their own passes(from the cpp side), I’m not sure how we can still bind our own cpp transformations and op define to python with TVM FFI. Do we have any example projects or guidelines for this? I’ll continue exploring to achieve a cleaner design.

tqchen · October 4, 2024, 4:30pm

I think it is possible, mlc_llm should serve as an example.

Here are some examples of binding global function https://github.com/mlc-ai/mlc-llm/blob/main/cpp/serve/radix_tree.cc#L822

Hzfengsy · November 13, 2024, 3:40pm

Update: Relax ONNX frontend already supportes all operators that relay supportes

tqchen · November 13, 2024, 4:08pm

Thanks @Hzfengsy for the great effort. Certainly helps us to pave ways forward.

tqchen · November 14, 2024, 6:59pm

As one of the first step, we plan to phase out the legacy VTA flow. The particular component stablizes and will remain available in past releases and is not actively maintained as of now. We also hope to make things simple to bring back up some future examples through out of tree development experiences so we can easily customize new compilation flows and enable new applications like this

tqchen · November 25, 2024, 2:18pm

As a next step, let us plan to phase out the micro flow which is mostly based on legacy. The particular component will remain available in 0.18.0 and previous releases and is not actively maintained as of now. We also hope to empower bringing back up some future examples through unity flow if there is community members who are interested in that direction.

fPecc · December 6, 2024, 8:33am

I have to leave a comment to express my feelings about this.

I just saw the PR for this, and I have to say, this is a truly sad day for me. The thing that first brought me near TVM was microTVM, and the ability to target embedded devices with such a reduced runtime. I have been using it a lot during the last few years, and of course, will continue working with it.

My feeling is that, without it, TVM is not going to be used anymore in papers targeting custom accelerators, which was a very interesting niche that was previously mostly filled by TVM. Some of the features that could be used with it, like USMP or the AoT Executor where truly very amazing features, and it is sad to see I will not be able to take advantage of this using microTVM in the future. The phase out of the VTA flow takes TVM in the same direction.

I hope we can take back again the development of microTVM in the future, maybe building some bridge to/from Relax.

tqchen · December 7, 2024, 5:37am

Thanks @fPecc , we would love to see relax based approach for targetting accelerators in future, hopefully the modularized flow make it even easier to do so, both in-tree and out-of-tree. There is indeed tradeoff here, however, at this point i also think bringing focus on the modern approach is critical for us to regain momentum and be sustainable for future developments. In the meantime, I would love to provide more inputs supporting discussions on how relax can help in some of these directions

cbalint13 · December 7, 2024, 4:20pm

@fPecc, folks,

I add here my humble experience with this topic, but only a pure personal point of view.

I used TVM in past for custom micro stuff (including experiments with custom fpga flows) and never relayed on the current micro part. I believe one can achieve his goals given the modularity of TVM, it is very easy to insert you passes or to hook in any parts of TVM internal flow without even touch upstream code (fork) or to declare a highly custom target with a wierd runtime. For micro stuff I always ended up using the native C codegen backend and passover results to my own needs, but this way it is possible to target even super-micro things like whatever 8bit u-controllers.

As another concrete example of custom HW acceleration I always enjoyed it that one even can insert verilated (from pure verilog land) blobs of block/micro-kernel and tensorize with it any ML operator without even touching upstream code, just by simple declarations of tensorizer in metaschedule for the tuning process. This is probably one of the neatest user-side feature of metaschedule (autotensorize, with it’s very intuitive template-declaration that auto-magically fits itself into operators).

As for the VTA part (again a personal opinion) I saw it as super inflexible & rigid thing, the mentioned [verilog-hw-blocks]->[autotenzorizer]->[metaschedule] approach for me yielded way much more flexibility and performance, and also the generated C code handled straight booth the HW acceleration parts on any custom soft-core cpu (having HW acceleration as pure ISA extensions).

I also think that the micro dragged in a lot (way to much) of non ml-compiler things, specific micro-runtime related headers and libraries that are quite diverse and numerous.

TVM really pioneers and keep pioneering lots of things starting early with elegant IRs (where was MLIR at that time ?) to the very neat end-to-end flow of autotune/metascheduling. I hope TVM continue keep the focus and rise the bar on these very things.

tqchen · December 7, 2024, 10:52pm

Thanks @cbalint13 for sharing your exprerience, such kind of modular experience is indeed something we hope to enable in the new relax flow, love to continue working together and leverage relax pipeline helps to further modularize and enable more usecases like you mentioned, perhaps also they can serve good community tutorials for general flow

fPecc · December 17, 2024, 7:46am

Thanks @cbalint13 for this insight! Indeed, I have been interested in doing something like what you are describing for a long time. Do you have some paper or more information on what you have been working on? I would love to know more about it.

cbalint13 · December 17, 2024, 7:59pm

@fPecc, Cc @tqchen

Do you have some paper or more information on what you have been working on? I would love to know more about it.

I don’t think it would be worth for a paper, but a small & clear tutorial might do it.

I am thinking to publish a small tutorial on this, within TVM, with the main goal to highlight the metascheduler’s autotensorization feature, how to use it to further tune kernels and nets in custom ways (i.e. it can showcase there a simple declarative older sse2/sse3 constructs as sample). Here the highlighs can be on:

how to decalare TIR search template for autotensorizer
how to declare the template’s call/implementation to tie it with a the fast ISA/intrinsics
how to tune nnet operators (imported graph) with metaschedule’s autotensoriezer enabled
how to inspect IR within this metaschedule tunning process (as a human readable form)
how to check/select/filter the autotensorized variants (regardless of performance) of tuned net

The autotensorizer can be used to insert more complex one-shot HW supported things too, not only classical fast ISA/instructions.

As a consolation that VTA & micro is gone, the mentioned tutorial’s last part/goal can include a small showcase how to construct a small custom “vector instruction/block” (i.e. a instantaneous HW dot-product) as a hypotetic ISA extension (i.e. it can be a futuristic RISC-V extension/block) and how to declare the TIR search template for it with it’s real or a virtual (in our case, to run on a local PC for simulation, a C equivalent or a verilated call/implementation function for it).

If you think this is a good idea and don’t mind I Cc you to the Draft of the PR.

My apologize if I derailed a bit the subject but I tried a alternative for the missing VTA/micro stuff here.

tqchen · February 3, 2025, 2:08pm

Happy new year! We just landed v0.19.0 branch thanks the community. This year is indeed more exciting and rapidly evolving as ever. Given the current landscape and the state of the project, I think it is a right time to phase out legacy relay flows.

To continue support community members who depends on legacy flows, the v0.19.0 branch will continue contain these components

This would allow us to focus a lot more on the new architecture and bringing up momentum as @Hzfengsy mentioned

1. Cleanup the codebase: By removing outdated or redundant elements, we can significantly reduce complexity and improve maintainability.
1. Unify our focus: Concentrating our efforts on the new unity flow will allow for more efficient development and innovation.

LeiWang1999 · February 10, 2025, 7:18am

Some suggestions for phasing out python dependencies:

remove dependency attrs, as it only be used in 3rdparty/tvm/python/tvm/relay/transform/memory_plan.py to wrap a class Region, but it will introduce am extra python dep attrs. instead, from python 3.7 we have a builtin package dataclass provides equivalent functionality
remove dependency decorator: maybe we can replace it with functools.wraps or copy decorator.py directly, as SciPy has done: decorator/src/decorator.py at master · micheles/decorator. Since decorator consists of a single Python file, maintaining it locally may be a viable option.

tqchen · February 10, 2025, 2:34pm

these sounds good, @LeiWang1999 do you mind send PRs for that?

LeiWang1999 · February 17, 2025, 6:20pm

Another discussion about the llvm dependency. I think we currently enable LLVM by default because we typically generate llvm host functions for different devices (such as CUDA). But generating C host code also seems to be a good option indeed. Relying on LLVM introduces many system dependency issues, making it difficult for users to build a project from scratch. (for example, llvm depends on some system libraries like libxml2 which user must install from source or from apt)

tqchen · February 17, 2025, 6:23pm

I have think a bit about LLVM dependency, while it is possible to some extent to get rid of it (we even had a stackvm version earlier for host that was not very commonly used), i think the benefit of having the LLVM dependency outweights its negatives, conda usually have great llvm dependency installation, perhaps we can have clear guides in the docs on how to do so