I’m new to TVM, does anyone know how to execute the C code generated by “relay.build”?
relay.build doesn’t generate C code for execution. It generates LLVM IR (in the case of CPU) and compiles directly to the executable binary.
Thank you for your reply. I’m confused whether tvm can compile the a tensorflow model to a c program .
Why you need to get a model in a C program? In general it’s not effective for compilers to generate human readable C code unless this is the only format that the corresponding vendor tool chain accepts. For example, TVM does have a C-like codegen, but it generates OpenCL code for non-NVIDIA GPUs.
Because I want to add a new backend in TVM, but there is no LLVM support for our backend. So I plan to compile the model to c code(it only supports c).
Then you could consider the BYOC flow. You could refer to a recent effort that integrates NVIDIA CUTLASS with BYOC flow and the C codegen:
Ok, thank you for your reply.
That means this PR will generate the C code for CUTLASS?
The C code generation is also used in microTVM. Our demo applications for the microNPU is a good example for this, it uses the host toolchain to compile the C output from TVM and that makes it easier to align the compilation flags as well as some features (such as the Ahead-of-Time Executor) only existing for the C output right now.
Also, if you want to add a new backend to TVM, I’d suggest looking at Target Hooks which can use more of the existing TVM code generation infrastructure - there’s an example Target you can use for reference.
hello, in fact, I’m a lot confused by the terminology “C code generation”. I wonder whether I can understand that you can generate the C code compiled by GCC. Looking forward to your reply.
In the example application I linked, TVM generates C code as the final output format, this includes C sources generated via the BYOC integration for the microNPU. The C code from TVM is extracted from a generated archive (in Model Library Format) and added to the sources list of the application, so you have something similar to (codegen is the output from TVM):
gcc -o demo_app \ src/demo.c \ codegen/host/src/default_lib0.c \ codegen/host/src/default_lib1.c
Which allows you to specify your own GCC flags easily and integrate it as part of an existing C application.
What @comaniac is suggesting is to instead use BYOC to generate C code, and use TVM to generate a binary from it (which invokes GCC behind the scenes if I remember correctly). A good example of this can be seen in the external code generation tests, which uses the
export_library method to convert the model into a binary shared object. This lets you use more of the Python interface to TVM and is better suited to richer environments.
Ummm, it seems too opaque for me. I have to admit that I am not only a beginner to TVM, and a beginner to programming. I will think your reply for a long time. Thank you for your clarification @Mousius
Sorry for my ignorance, I gradually get to know a little about BYOC flow. So I want to confirm something. First: If I use the BYOC flow, will the TVM’s graph optimization passes apply to the relay function? Second: BYOC won’t generate the TIR, it transform the relay to C code? Looking forward to your reply sincerely!
Q1: Yes. This is one important purpose of introducing BYOC. All existing graph optimizations can be directly leveraged for custom codegens.
Q2: It’s up to you. In general most BYOC developers wish to generate compilable code directly from the graph level IR, because it is easier for them to integrate their tool chains. On the other hand, if your codegen accepts TIR, you could still lower the partitioned Relay function to TIR and feed to your codegen.
if you want to offload the entire model and you want to use your own C compiler, the
c backend will indeed do what you want. We built a specialized export function for microTVM which provides more verbose compiler output (
tvm.micro.export_model_library_format), but right now it only works for the case where you’re running the model on a single CPU. As @comaniac said, this often doesn’t produce the best performance.
if instead you just want to offload a section of the model to an accelerator (or the whole model to some accelerator, but use
libtvm_runtime.so to control it from a CPU), then the approach suggested by @comaniac is probably the way to go.
Thanks for your answer, that makes a lot of sense. After reading the BYOC’s blog, I suddenly have a question(I guess it’s the last), the BYOC can apply to specific operator. If I convert the front model to a relay function which contains more than one operators or functions, the BYOC can take effect?
Please forgive for my ignorance. What is the relationship between libtvm_runtime.so and BYOC?
sorry I should clarify.
libtvm_runtime.so means the TVM C++ runtime library. Compiled TVM models need to be ran using a TVM runtime (there are two–the TVM C++ runtime or the TVM C runtime). The TVM runtime handles details such as calling the compiled operator functions in graph order and memory allocation. The TVM runtime can either be placed on the same device as the operators, or it can be placed on a separate device and it can drive operator execution on other devices using the Device API. Further, if the device is POSIX-ish and supports dynamic memory, you should choose the C++ runtime; if not, you can try the C runtime (but it does not yet support executing models on other devices).
If you want to place everything on a single device, you might consider using the
c backend, and then just compile the TVM C or C++ runtime and the model using your custom compiler. See
apps/bundle_deploy for an example using the GraphExecutor. If the TVM runtime needs to live on a separate device from the one driven by your compiler, then consider the BYOC flow Cody is describing above.
Thank you for your reply very much.