[RFC][BYOC] Vitis-AI integration

Motivation

Vitis-AI is Xilinx’s development stack for AI inference on Xilinx’s FPGA hardware platforms, for both edge and data center applications. This RFC will look at how accelerated subgraphs with FPGA in TVM using the BYOC flow. The idea is that by offloading subgraph from a relay graph to an FPGA supported by Vitis-AI we can achieve faster inference time.

Proposal

We have been working on integrating Vitis-AI using the BYOC infrastructure and verified the ability to use BYOC in combination with Vitis-AI. Our current implementation uses an open source IR translator PyXIR from Xilinx as intermediate represenatation for subgraphs. The following is an overview of the flow from compilation to runtime we have built:

Compilation to runtime

  • Start with frontend graph
  • Convert to Relay graph
  • Annotate the graph for the given Vitis-AI DPU (Deep Learning Processing Unit) target using PyXIR.
  • Run MergeCompilerRegions & PartitionGraph passes
  • Use the codegen stage to convert Relay subexpression into PyXIR XGraph
  • Serialize PyXIR model path and generate lib.so

Running the generated module

  • Load lib.so and deserialize model path and load the model into PyXIR XGraph
  • Create PyXIR module from XGraph representation
  • The pyxir runtime module is exposed to the graph runtime through the the packed GetFunction.
  • Run PyXIR, as required by the graph runtime by supplying input and output buffers

Building TVM with Vitis-AI Support

Currently, we support the BYOC flow through building TVM within a docker image based on the public Vitis-AI docker. This provides the required Xilinx tools for FPGA compilation, quantization and runtimes. The following CMake options have been added to enable TVM to compile the additional files that define the Vitis-AI BYOC code.

  • USE_VITIS_AI - Enabling this flag will add support for Vitis-AI compile and runtime modules.

Vitis-AI DPU Graph Representation

Vitis-AI DPU has a graph translator called PyXIR. We convert subgraph function into PyXIR graph and create a runtime module during the Relay build phase. PyXIR is used in other framework integrations by Xilinx including it is used in the Vitis-AI integration available in ONNXRuntime

CodeGen and Compilation

The process of offloading partitioned subgraphs to Vitis-AI DPU starts at compilation. The aim here is to convert the relay subgraph into PyXIR so that the Vitis-AI runtime analyzes the graph for supported/unsupported nodes. The output of the codegen module is VitisAIModule which contains PyXIR representation and runtime module of the subgraph.

Runtime Support

At runtime we use the PyXIR runtime module to execute the compiled Vitis-AI model on the FPGA. To be able to accelerate inference of Neural Network models with Vitis-AI DPU accelerators, models need to be quantized to integers. In this RFC we make use of on-the-fly calibration to remove the additional preprocessing step and do not need to explicitly call quantization on relay. Using this method one doesn’t need to quantize their model upfront and can make use of the typical inference execution calls (module.run) to calibrate the model on-the-fly using the first N inputs that are provided (See more info from here).

Testing

We currently have two different types of tests and both of them reside in python/contrib/test_vitis_ai directory.

  • test_vitis_ai_codegen.py includes annotating relay graph with given Vitis-AI DPU annotations and the resultant graph is compared with reference relay graph

  • test_vitis_ai_runtime.py includes an complete network test with a resnet18 model partly offloaded to PyXIR for DPU acceleration. However the offloaded subgraph is just executed on CPU and therefore this isn’t a full end-to-end test.

The test_vitis_ai_codegen.py and test_vitis_ai_runtime.py depend on PyXIR but do not require an FPGA to be available. They will create and verify the complete BYOC process and the interface with PyXIR. For doing full end-to-end tests we would have to be in the Vitis-AI docker environment with access to an FPGA.

Future improvements

This RFC is only intended to be “initial” description. Below is a list of items we hope to add/improve upon in the near future.

  • Support for more than one subgraph for given Vitis-AI DPU
  • Support for future Vitis-AI DPUs

Any comments are most welcome.

11 Likes

Thank you @mak for the RFC. Seeing a hardware vendor such as Xilinx add their FPGA-centric code-gen to TVM is great news.

Questions:

  1. Is the plan to add the tests in test_vitis_ai to TVM CI to ensure that Vitis BYOC remains stable? In which case will that require running the tests in a new Docker image, or will you extend an existing Dockerfile with the necessary dependences to run the tests that depend on PyXIR?
  2. It would be great to have the test_vitis_ai_runtime.py as a TVM tutorial example that one could run on an FPGA by switching a flag in the tutorial (when building the Sphinx gallery, it will follow a default CPU-based emulation behavior).

Overall it’s great to see a new hardware backend codegen added to TVM, thanks for the great work!

@zhiics @comaniac @mbaret @mbrookhart @tqchen please let @mak know if you have any suggestions on adding Vitis as a new code-gen to TVM via BYOC! thanks

Thanks @mak for the contribution. The integration flow sounds reasonable to me. I am curious about your graph representation as it looks you save it into your own format in the module. Will you rely on module load/save to serialize your IR?

For testing, can we have some simulation for the parts on the accelerator? Otherwise, it sounds that the runtime is not really tested, right? In the long-run, do you have a plan to add a FPGA instance to host the docker image and enable e2e tests?

Thanks for the RPC! I have the same questions as @zhiics has raised:

For the graph representation, you mentioned XGraph representation in the proposal. Is the serialized XGraph representation in JSON format or other self-defined format?

In addition, how to test runtime would definitely be an issue, especially Vitis-AI doesn’t support the FPGA on F1 instance, which is the only FPGA instance on AWS.

As described in ug1414-vitis-ai.pdf

Vitis AI offers a series of different DPUs for both embedded devices such as Xilinx Zynq®-7000,Zynq® UltraScale+™ MPSoC, and Alveo™ cards such as U50, U200, U250 and U280, enablingunique differentiation and flexibility in terms of throughput, latency, scalability, and power.

, and I didn’t find existing document on running Vitis-AI DPU on F1 instance.

Note that the open-sourced DPU hardware design is currently encrypted, it will be difficult to evaluate the hardware on the targets that are not supported.

Broadly this looks the follow the same strategy as with our Ethos-N integration, aside from the quantization approach which would be interesting to hear more about. The other thing I wonder is if there’s a specific framework in mind for your integration (is it ONNX)?

Thanks @thierry for your questions.

  1. Is the plan to add the tests in test_vitis_ai to TVM CI to ensure that Vitis BYOC remains stable? In which case will that require running the tests in a new Docker image, or will you extend an existing Dockerfile with the necessary dependences to run the tests that depend on PyXIR?
  • Yes. We are adding test cases and docker file. For running end to end flow we would need FPGA and corresponding setup.
  1. It would be great to have the test_vitis_ai_runtime.py as a TVM tutorial example that one could run on an FPGA by switching a flag in the tutorial (when building the Sphinx gallery, it will follow a default CPU-based emulation behavior).
  • At the moment in test_vitis_ai_runtime.py we are planning do cpu based emulation with PyXIR. This flow is same when user switches to FPGA.

Yes, we save the XGraph it in our own format and serialize it in the Module by keeping track of the path to the files.

For testing, the PR will include pure cpu tests for verifying the interface with PyXIR. The verification of the PyXIR flow down to the runtime will happen through internal tests at the moment. These internal tests use the TVM flow so are e2e.

1 Like

Yes, the XGraph is serialized in JSON format.

Regarding the question on how the test the runtime, as mentioned in my answer to @zhiics above we will rely on internal e2e tests at the moment to verify the PyXIR - runtime flow. And by adding codegen and (cpu) runtime tests to the PR we intend to verify the TVM - PyXIR interface. We think that these two parts together should keep the TVM Vitis-AI codegen stable.

1 Like

I am not sure if TVM can be run on AWS F1 with Vitis AI Vitis-AI Integration — tvm 0.8.dev0 documentation. Indeed the TVM and Xilinx mention about Alveo U200 and 250 but nothing on the Xilinx UltraScale+ VU9P. This is very confusing.