Motivation
Vitis-AI is Xilinx’s development stack for AI inference on Xilinx’s FPGA hardware platforms, for both edge and data center applications. This RFC will look at how accelerated subgraphs with FPGA in TVM using the BYOC flow. The idea is that by offloading subgraph from a relay graph to an FPGA supported by Vitis-AI we can achieve faster inference time.
Proposal
We have been working on integrating Vitis-AI using the BYOC infrastructure and verified the ability to use BYOC in combination with Vitis-AI. Our current implementation uses an open source IR translator PyXIR from Xilinx as intermediate represenatation for subgraphs. The following is an overview of the flow from compilation to runtime we have built:
Compilation to runtime
- Start with frontend graph
- Convert to Relay graph
- Annotate the graph for the given Vitis-AI DPU (Deep Learning Processing Unit) target using PyXIR.
- Run MergeCompilerRegions & PartitionGraph passes
- Use the codegen stage to convert Relay subexpression into PyXIR XGraph
- Serialize PyXIR model path and generate
lib.so
Running the generated module
- Load
lib.so
and deserialize model path and load the model into PyXIR XGraph - Create PyXIR module from XGraph representation
- The pyxir runtime module is exposed to the graph runtime through the the packed GetFunction.
- Run PyXIR, as required by the graph runtime by supplying input and output buffers
Building TVM with Vitis-AI Support
Currently, we support the BYOC flow through building TVM within a docker image based on the public Vitis-AI docker. This provides the required Xilinx tools for FPGA compilation, quantization and runtimes. The following CMake options have been added to enable TVM to compile the additional files that define the Vitis-AI BYOC code.
- USE_VITIS_AI - Enabling this flag will add support for Vitis-AI compile and runtime modules.
Vitis-AI DPU Graph Representation
Vitis-AI DPU has a graph translator called PyXIR. We convert subgraph function into PyXIR graph and create a runtime module during the Relay build phase. PyXIR is used in other framework integrations by Xilinx including it is used in the Vitis-AI integration available in ONNXRuntime
CodeGen and Compilation
The process of offloading partitioned subgraphs to Vitis-AI DPU starts at compilation. The aim here is to convert the relay subgraph into PyXIR so that the Vitis-AI runtime analyzes the graph for supported/unsupported nodes. The output of the codegen module is VitisAIModule which contains PyXIR representation and runtime module of the subgraph.
Runtime Support
At runtime we use the PyXIR runtime module to execute the compiled Vitis-AI model on the FPGA. To be able to accelerate inference of Neural Network models with Vitis-AI DPU accelerators, models need to be quantized to integers.
In this RFC we make use of on-the-fly calibration
to remove the additional preprocessing step and do not need to explicitly call quantization on relay. Using this method one doesn’t need to quantize their model upfront and can make use of the typical inference execution calls (module.run) to calibrate the model on-the-fly using the first N inputs that are provided (See more info from here).
Testing
We currently have two different types of tests and both of them reside in python/contrib/test_vitis_ai
directory.
-
test_vitis_ai_codegen.py
includes annotating relay graph with given Vitis-AI DPU annotations and the resultant graph is compared with reference relay graph -
test_vitis_ai_runtime.py
includes an complete network test with a resnet18 model partly offloaded to PyXIR for DPU acceleration. However the offloaded subgraph is just executed on CPU and therefore this isn’t a full end-to-end test.
The test_vitis_ai_codegen.py and test_vitis_ai_runtime.py depend on PyXIR but do not require an FPGA to be available. They will create and verify the complete BYOC process and the interface with PyXIR. For doing full end-to-end tests we would have to be in the Vitis-AI docker environment with access to an FPGA.
Future improvements
This RFC is only intended to be “initial” description. Below is a list of items we hope to add/improve upon in the near future.
- Support for more than one subgraph for given Vitis-AI DPU
- Support for future Vitis-AI DPUs
Any comments are most welcome.