Implementing AOT in TVM

giuseros · April 1, 2021, 4:34pm

Hi all, I was finally able to have a first version of the AOT work in a PR upstream.

PR

You can find the PR here: [AOT] Introducing AOT in TVM by giuseros · Pull Request #7785 · apache/tvm · GitHub

At this stage, I gladly accept any feedback on things that can be improved in the PR or on issues I might have overlooked. Please, help me smoothing the edges of this work

Limitations

There are two main limitation of the current work:

We didn’t add support for LLVM codegeneration. This is because we thought better to agree on the overall picture first using the c backend as POC, and then taking care of the LLVM backend
We didn’t include support for LetNode in the aot_codegen. Support for the LetNode is in the pipeline and will be added soon

Next steps

Bear in mind that this is only the first step of a journey. We are currently working on different improvements to AOT, in particular:

LLVM support LLVM support is currently being worked on and we are almost there
Name mangling We are adding name mangling into the picture, i.e., the user should be able to specify a prefix and this prefix should be added to all the global names used in the library. In this way, we will enable the user to build and link more than one network in the same application.
DLTensor surgery Since the memory allocation is done statically, we don’t need to carry DLTensor through the generated code, as it exposes metadata that are not consumed by the codegen and that increases the size of the binary image to be flashed on the microcontroller
Unpack the runner function signature Change the API of the runner function. Indeed, we would like the runner function to not have a packed API signature. This is to avoid instantiating type_ids or forcing a dynamic size of the function stack (all things that don’t add benefits in the embedded space, but take a toll in terms of code size, performance and power)
int64_t surgery Using int64_t on embedded devices usually increases in register spilling, which means power and performance will be heavily affected. We are removing this datatype in every place it’s being used.
Remove param lookup through __lookup_linked_param: in order to make things simple, we are currently reusing the __lookup_linked_param function to access the parameters in the library. However, with AOT we can simply create a TIR builtin that accesses the parameters directly without going through the issues of a function invocation. This is still with the aim of saving power, performance and space.

cc: @ramana-arm @manupa-arm @areusch @mbaret @stoa @mjs