[DISCUSS] TVM Unity Transition Docs Refactor

tqchen · January 30, 2024, 12:00am

We are in a new age of genAI, with the arrival of new workloads and learnings from our many years of building machine learning compilation pipelines. As a community, our core strategy also evolves from a more build-centric view towards a more Unity core abstraction-centric strategy. As the unity branch gets merged and we start to push towards the new strategy, we also need to keep the documents up to date and this core strategy. Motivated by these needs, this post proposes a documentation refactor.

First, we would like to have a few focused documents that help the users understand the core abstraction-centric view and what they can do to build and customize ML compilation pipelines for their own model and hardware environment needs. Additionally, we would like to have a small collection of focused how-to guides that also closely connect. Finally, we would need to provide mechanisms for existing and future sub-components to organize their respective documentation under the specific topic guides.

Theme and Rationale

We would like start by touching the motivation of the project and the key goals of the project. As machine learning becomes increasing important in the age of genAI, we would like to collectively bring optimized AI applications everywhere for everyone. ML compilers provide tools to help ML scientists and engineers on the following goals:

G0: Optimize performance of ML workloads.
G1: Deploy ML models to a diverse set of hardware and runtime environments.
G2: Integrate the compiled modules into current ecosystems, as optimized operators, accelerated modules of subgraph computations, and backend for key genAI APIs.

Importantly, there is a strong need for G3 - continuously improve and customize the compilation approaches as we start to focus on specific model verticals; for particular hardware backend or distributed environments, there is a strong demand. This last goal means we need to empower the community to build their own TVM-based compilation pipelines by bringing in new compilation passes and integrating with performant libraries and novel code generation approaches.

The key goals would serve as the anchor to convey the overall abstraction centric approach through a quick start guide

Quick Start Strawman

What is ML compilation
What can ML compilation do (list the G0-G2)
Overall flow of ML compilation pipeline
- Import/construct a model
- Perform composable optimization transformations
  - Leverage “pipelines” to transform the program
  - The pipeline encapsulates a collection of transformations in two categories:
    - High-level optimizations such as operator fusion, layout rewrites
    - Tensor program optimization: Map the operators to low-level kernel via library or code generation
- Build and deployment
  - Each function becomes a runnable function in the runtime
  - Integrates with runtime environments (c++ and python)
Customizing the ML compilation pipeline
- State G3
- Everything centered around IRModule transformations
- Composable with existing pipelines
- Example: dispatch to library:
  - We would like to quickly try out a variant of library optimization for certain platforms
  - Write a dispatch pass
  - How it compose that with existing pipelines
- Example: fusion
- Standard pipelines
  - We provide pipelines as standard libraries
  - Allow quick specialization for specific domains, e.g. LLM
Integration with ecosystem
- Bring compiled function as Torch module
What to read next
- Read some of the topc guides sessions
- Read how to section for more specific guides for extending different perspectives.

How to guide common pattern

As we start to build new how-to guides, we explicitly encourage the following pattern to emphasize the connection of each guide to the overall approach. Let us take say TensorIR schedule as an example; we will use the following outline.

S0: Introduce elements of TensorIR
S1: Show examples of transformations and how they impact performance
S2: How does TensorIR fit into the overall flow?
- Give examples of TensorIR match and dispatch pass
- Show how to compose it with a default build
- Discuss scenarios where it can be used: e.g. provide customized optimized kernel for a specific backend.

Specifically, every guide should include S2: how does it fit into overall flow. This approach helps us to bring things back into the bigger picture and allow users to know how each guide can be composed with other ones. This enables the guide to be more focused and connected to the central theme.

Overall Structure

The last section discusses the key goals and how they impact get started and how to guide. This section outlines the overall structure of the documents. Quick start and how to (that have clear S2 section) aims to provide cohesive set of documents that presents the projects through core abstraction centric approach in genAI. The TOPIC GUIDES section will contain tutorials and guides about specific module in organized ways. We can also use these sections to provide deeper dives of the corresponding modules. This section also allows us to preserve docs about some existing modules that have their own usecases.

GET STARTED
- Install
- Quick start
HOW TO
- Apply Default Tensor Program Optimization
- Dispatch to Library
- Optimize GEMM on CPU
- Support Customized Operator
  - Use call_tir when possible
  - Example: Rotary embedding operator
  - Use call_dps_packed
  - If necessary, can add new op to system
- Bring your own customized codgen
- Construct model with nn.Module API
- Import from PyTorch
- Import from ONNX
- Optimize Object Detection
- Remote Profile using TVM RPC
- Contribute to this document
- more…
TOPIC GUIDES
- Relax
- TensorIR
- Runtime and Library Integration
- MetaSchedule
- nn.Module
- micro
- Relay
- AutoTVM
- more…
REFERENCE
- APIs

One motivation is to avoid deep nesting, have fewer key docs that explains concepts but bring connections to the overall approach, while also bring enough room for content for each specific topics independently.

Discussions

The main goal of the post is to kickstart some of the effort. Hopefully we can bring together a good structure that enables us to continue innovate through the core strategy for emerging needs. Would also love to see more tutorials, guides and discussions about how we can bring fun usecases and optimizations, see some of the examples below

gfvvz · January 30, 2024, 3:12am

Hope more guides on how to add own LLVM codegen to TVM, and have detailed introduction on it. And hope we can have more flex way to integrate own LLVM backend codegen to TVM as most self-design core use LLVM as base. For example, if some one want to codegen device only LLVM assembly code, no need host side code generations, seems need to change a lot of code for now.

Hzfengsy · January 30, 2024, 4:42am

Thanks @gfvvz for the suggestion, could you please show actionable items about this? Then I’d like to work on it. To be specific:

Which PUBLIC target and device would be good to demonstrate
which workloads would be great? GEMM?
How to run tutorials on CI? using a simulator?

bitfield · January 30, 2024, 12:56pm

As someone that is fairly new, I think a general overview of the different representations and tools would be great. I am thinking about something like a compatibility matrix, maybe, to show which different tools go together. E.g., for microTVM you should use Relay, as Relax is currently not supported for this (at least this is my understanding). Just to get people like me started up faster.

Similarly, a documentation about “non-default” interfaces would be nice to have. Specifically, I am thinking about things like relay.backend.use_meta_schedule in the lowering config. From my understanding, it integrates TIR based scheduling with the Relay operator strategy. I figured this out by looking at the test cases and then trying different things out until it worked, but I don’t think this behavior is documented somewhere. I don’t know if this case exists anywhere else, but it would be good to document interfaces between the different tools and representations.

gfvvz · January 30, 2024, 1:13pm

Thanks @Hzfengsy , very glad to know you would work on it, nice! I think we can take NVPTX backend as reference. Just only codegen kernel code only, and just codegen as simple .ptx as hand writing one. We may not need end2end solution for evaluation, just want to use TVM’s operator level DSL to write kerenel efficiently.

ysh329 · January 30, 2024, 2:23pm

The purpose of our document is to help beginners get started as soon as possible and meet their basic needs; Users who understand basic usage can understand the parameters of the API and achieve proficient and efficient use. The former is reflected in the HOW TO / TUTORIAL document for users, while the latter is reflected in HOW TO / TUTORIAL for developer.

Whatever who of both, I think API queries during practice is required. I would like to share my thoughts on query APIs.

If users want to understand the arguments of a certain API, they usually query the Reference document: for example Python API — tvm 0.16.dev0 documentation , or Index — tvm 0.16.dev0 documentation.

This a very ordinary thing. If the documentation of this API interface can establish a connection with the unit testing of this API, I think users will have a better use experience, because there are two facts:

Testing a certain API is the most comprehensive way to use it;
Many friends learn through tests.

Therefore, I think it’s necessary to create a mapping between unit tests and APIs when organizing API reference. Because sometimes there may be obstacles where the file name of the API for testing a certain interface does not correspond to the name of that API.

sanirudh · January 30, 2024, 4:23pm

I’ve had this thought before and I like this idea a lot. I don’t really know if we can use any tool to generate and link APIs to their corresponding unit tests (I’ll try to find a way to do this if folks are interested).

I’m used to searching the unit tests to figure out how to use any new API, but that might not be obvious to new users, so linking to that directly might be really useful.

sanirudh · January 30, 2024, 4:32pm

One other small suggestion I feel could make a nice experience to new users would be if for all tutorials that we write are in jupyter-notebook and we provide a way to download the entire tutorial as a jupyter-notebook file that can tried out locally.

I’ve seen this style a while back when I used fast.ai docs. For example, if we view this tutorial for vision, then a corresponding ipynb file is available (They haven’t linked to it in the docs website, but that should be easy to do).

tqchen · January 30, 2024, 4:42pm

great points, as a middle ground, usually having Examples section that comes with code exmaples help without having to link to all tests

slyubomirsky · January 31, 2024, 3:10am

Having executable tutorials that give an error when they break will be good for ensuring docs remain up-to-date, so notebooks would be especially useful if we can check them during CI (for example).

tqchen · February 1, 2024, 11:15pm

in the past we use sphinx gallery to generate the notebooks. one balance to play is that a lot of the notebooks are intergations that takes longer time to run. Ideally we should bring some of them to separate pipeline that triggers independently, or run as a nightly where the errors get covered by some debuggable unit testcases

slyubomirsky · February 2, 2024, 12:16am

Nightly or less-frequent runs make sense for docs, since nothing will be “on fire” if a tutorial breaks (though they should still be fixed).

LeiWang1999 · February 2, 2024, 3:11am

Just a guess, do we have an opportunity to fine-tune a large language model to a TVM copilot bot to help answer questions for docs and development?

tqchen · September 6, 2024, 12:59pm

Thanks to @Hzfengsy ! we finished one iteration of docs transition