CSourceMetaData Module : A CSourceModule to Hold Function Registry for uTVM

manupa-arm · November 30, 2020, 7:41pm

There had been a need for a common c-source module to hold the metadata for uTVM where it mostly deals with “c” or “llvm” modules. One of the main need is to have a model-wide function registry for c-runtime in the absense of dlsym in bare-metal environments (discussed here : https://github.com/apache/tvm/pull/6950 ). There is a “metadata” module in the codebase but that currently gets compiled and re-created in the runtime via packing and unpacking of imports.

Moreover, when using BYOC compilation flow all the external metadata modules are wrapped in a metadata module unconditionally. This approach serves to seperate metadata (currently only constants are seperated out) from code to ease out the compilation of external modules. However, the metadata module being a non-DSOExportable module it uses SaveToBinary() interface to pack itself as an import and uses init() process construct itself back in the runtime in the stack/heap. Thus, in the world of uTVM this process ends up creating the constants (params or otherwise) in the volatile memory which may not be practical for memory constrained devices.

While that works reasonably well for non memory constrained devices, we think it would be beneficial to have another layer of a metadata module in the form of a CSourceModule for bare-metal environments / or for compilation flows where it does not prefer metadata/constant unpacking into the stack/heap.

Thus, this RFC provides a c-source layer to the multi-module runtime-module hierarchies to present the metadata that needs to compiled in a global scope. Thus, it could be useful to the TVM stack in general though most of the initial use is for bare-metal uTVM compilation to hold the function registry.

We would very much like to hear thoughts on the proposal

Function Registry

The main requirement for c-source metadata module comes from the requirement of needing to have function registry in the bare-metal environments. This was one of the features that was introduced in this PR : https://github.com/apache/tvm/pull/6145. However, function registry need not just to include the functions present in the TIR-generated runtime module but also all the modules that get generated from the relay IRModule in the presence of external modules. Thus, this RFC proposes that every runtime module to implement a PackedFunc : “get_func_names” that return the names of the function it contains. Therefore, as it stands today, we would not need to create the function registry as part of the TIR-based codegen of “c” and “llvm” modules. Moreover, this enables all the function names of all external modules to be included in the function registry that gets generated as a c-source in the CSourceMetaData module.

cc: @areusch @zhiics @mbaret @ramana-arm @tqchen

PR : https://github.com/apache/tvm/pull/7002

tqchen · November 30, 2020, 10:05pm

Requiring get_func_names for every module subclass might bring in additional overheads of the implementation itself. It might also creates limitations in linking behavior (e.g. the dlopen approach does not have a good way to list all the functions as they are queried by name from the caller).

This being said, i think makes sense to optionally embed such meta data names for certain modules.

manupa-arm · December 1, 2020, 6:13am

@tqchen our intention here was the calling of get_func_names to only be a compile time activity – similiar to “get_const_vars” being only a compile time activity. So that in the export_library it will be called on all the modules to collect the function names it implements if the target is using c runtime with system-lib enabled. Therefore, I do not follow how this affects dlopen.

Moreover, it is also optional in the sense this will only happen if the runtime module implements the packed func “get_func_name”, otherwise the function registry will not get populated with names of such runtime modules. (Though I would imagine if an runtime module needs to work in c runtime on a bare-metal environment, their names should go into the function registry)

tqchen · December 1, 2020, 1:58pm

I agree it is good to make it as a compile time activity.

In that case, perhaps we should not make it as part of runtime.Module, but instead make get the information from IRModule. This is how the current c runtime generate the function registry table.

It is understandable that there could be some gap in the custom code generators(due to the fact that they return runtime.Module). One idea is to make sure we already have a list of global symbols(in IRModule) before sending to codegen, so we know what functions to expect

manupa-arm · December 1, 2020, 3:40pm

I agree broadly.

The only reason we’ve implemented the get_func_names() in the runtime.Module is to keep uniformity between internal (not sure if thats right to be called as such ) and external modules.

This is also based on the precedence that “get_const_vars” also being inside the runtime.Module and based on the same argument I feel that could also been inside the IRModule.

So I think we would need to end up in a place where the LoweredOutput will have both IRModules and their respective runtime.Modules, prolly as a pair. I guess that would be a bigger refactor. Is that your direction of thinking as well ?

tqchen · December 1, 2020, 5:46pm

agree. The main question though is whether or not we will need to have Codegen to produce a new IRModule, or should we lock down on the function names(that can be accessed by other modules) before calling into codegen, assuming the former case, then the codegen does not have to produce the IRModule itself (since the caller will have it).

areusch · December 2, 2020, 2:12am

@manupa-arm this looks like a good idea. a couple questions after reading over this RFC and the PR:

do you intend to also include constants (I.e. linked params) in the .text-only metadata module? the reason I ask is that we discussed this limitation when adding linked params, but decided to defer creating some type of model-level metadata container until we had another example of that. it would be nice to create that container with this RFC, if possible. currently, I think such a container would hold these elements:
- Array<String> function_names;
- Map<String,LinkedParam> linked_params
do you also propose to add an LLVM implementation in your PR? I think right now it just generates c, correct?

on the question of locking down the function names prior to calling external codegen: it seems like it depends whether we generate this module before or after GraphRuntimeCodegen (I.e. model-level codegen). Given we are writing a model-level data structure, i’d propose we do codegen for the CSourceMetadataModule after GraphRuntimeCodegen, which would then allow model-level codegen to contribute extra function names (I.e. returned from external compiler) to the FuncRegistry. thoughts? @tqchen

manupa-arm · December 2, 2020, 6:42am

@areusch

do you intend to also include constants (I.e. linked params) in the .text-only metadata module? the reason I ask is that we discussed this limitation when adding linked params, but decided to defer creating some type of model-level metadata container until we had another example of that. it would be nice to create that container with this RFC, if possible. currently, I think such a container would hold these elements:

Not in the initial PR, but sounds plausible and a PR would be more than welcome to move it here. However, the linked params need a proximity for specific runtime.Module as they would represent the constant used there. E.g., some runtime.Module wants treat the linked params to a different section or byte alignment (e.g., say, 16 byte aligned). However, the idea is as long as it could queried from the runtime module (in compile time), the c-source metadata module is able to extract it.

do you also propose to add an LLVM implementation in your PR? I think right now it just generates c, correct?

I dont think we need to have an LLVM implementation for metadata. The reason for having a module LLVM codegened is to use LLVM optimization pass pipeline which should only be used by operators, IMO. Therefore, we dont see how that would benefit in the compiling metadata. Thus, a c-source would suffice for both c and llvm targets as they would be linked together via export_library.

on the question of locking down the function names prior to calling external codegen: it seems like it depends whether we generate this module before or after GraphRuntimeCodegen (I.e. model-level codegen). Given we are writing a model-level data structure, i’d propose we do codegen for the CSourceMetadataModule after GraphRuntimeCodegen, which would then allow model-level codegen to contribute extra function names (I.e. returned from external compiler) to the FuncRegistry. thoughts?

Yes, thats how this is implemented in the PR. It is collecting the function names from runtime.Module s which is happening after both GraphRuntimeCodegen as well as external codegen. As I said that is mainly to maintain uniformity over internal and external modules. However, I think we can refactor the LoweredOutput to hold/return IRModules as well as runtime.Modules. Thus, we can iterate over the IRModules post-codegen to get the function names. However, currently all the external codegens do not have their respective IRModules in neither compile_engine nor graph_runtime_codegen / vmcodegen.

Therefore, the question is more of how do we implement the lockdown of IRModules. I think it could simply be thought as read-only and preserved in the LoweredOutput.

areusch · December 2, 2020, 6:03pm

@manupa-arm thanks for your reply, some further comments inlined.

do you intend to also include constants (I.e. linked params) in the .text-only metadata module? the reason I ask is that we discussed this limitation when adding linked params, but decided to defer creating some type of model-level metadata container until we had another example of that. it would be nice to create that container with this RFC, if possible. currently, I think such a container would hold these elements:

Not in the initial PR, but sounds plausible and a PR would be more than welcome to move it here. However, the linked params need a proximity for specific runtime.Module as they would represent the constant used there. E.g., some runtime.Module wants treat the linked params to a different section or byte alignment (e.g., say, 16 byte aligned). However, the idea is as long as it could queried from the runtime module (in compile time), the c-source metadata module is able to extract it.

linked params right now are exported from the graph runtime codegen rather than a specific TVM backend, so they don’t actually need to be referenced directly by the generated code. I think if a BYOC codegen wants to embed parameters, we can solve the question of how to represent them in the metadata module (if that’s necessary) for bare metal later on. for now, the main thing i’d like to do is move the graph-level linked params into this new metadata module you’re creating. I can do that as a follow-up if you like.

do you also propose to add an LLVM implementation in your PR? I think right now it just generates c, correct?

I dont think we need to have an LLVM implementation for metadata. The reason for having a module LLVM codegened is to use LLVM optimization pass pipeline which should only be used by operators, IMO. Therefore, we dont see how that would benefit in the compiling metadata. Thus, a c-source would suffice for both c and llvm targets as they would be linked together via export_library.

I think if a user is using llvm backend, they aren’t required to specify a C compiler to TVM. They may have a C compiler available for external use, but it only needs to be configured for an external build system. Since we already have the code to generate FuncRegistry in LLVM directly, I think we should keep that code in and support generating an LLVM metadata module.

manupa-arm · December 2, 2020, 6:36pm

linked params right now are exported from the graph runtime codegen rather than a specific TVM backend, so they don’t actually need to be referenced directly by the generated code. I think if a BYOC codegen wants to embed parameters, we can solve the question of how to represent them in the metadata module (if that’s necessary) for bare metal later on. for now, the main thing i’d like to do is move the graph-level linked params into this new metadata module you’re creating. I can do that as a follow-up if you like.

Yes, that make sense. Thanks

I think if a user is using llvm backend, they aren’t required to specify a C compiler to TVM. They may have a C compiler available for external use, but it only needs to be configured for an external build system. Since we already have the code to generate FuncRegistry in LLVM directly, I think we should keep that code in and support generating an LLVM metadata module.

So we use a C compiler for the linking as it stands today. Do we have a use case / value where we would just use linker (e.g., ld) over using a c-compiler (e.g., gcc) for linking ? The reason Im saying is this the default 'fcompile’s for export_library are all c compilers.

areusch · December 2, 2020, 11:15pm

So we use a C compiler for the linking as it stands today. Do we have a use case / value where we would just use linker (e.g., ld) over using a c-compiler (e.g., gcc) for linking ? The reason Im saying is this the default 'fcompile’s for export_library are all c compilers.

true, though right now that’s only required if you are using BYOC on µTVM. I do think that when cross-compiling, providing a compiler is a larger burden than compiling a module for the native host. I think that a C frontend also requires a bit more configuration than linking object files together into a library. i would like to get to a position where if you are using topi schedules with LLVM-supported targets, TVM contains all of the dependencies you need to build object libraries. so long as you’re not generating code that deals with hardware specifics outside the ISA, you shouldn’t need to tell TVM about your target compiler where LLVM supports it.

manupa-arm · December 3, 2020, 7:08am

Hi @areusch, Thanks for the explaination. Lets discuss this a little further. I think we need a good reason if we are to maintain the same piece of functionality in two places – namely as LLVMMetadataModule and CSourceMetadataModule.

true, though right now that’s only required if you are using BYOC on µTVM.

Not exactly, when we go down the metadata module approach, every IRModule that has a target “llvm” or “c” (with system-lib and runtime=c) will produce at least two runtime modules. Thus, we would need to use export_library to link the artifacts produced by saving such runtime modules. I think this is what we agreed here when making uTVM to support multi-module builds (though the discussion was about external modules) : External modules in uTVM. I think module.save should be used for unit testing and debugging.

I think that a C frontend also requires a bit more configuration than linking object files together into a library.

Maybe, can you give an example what are additional configurations that we would need to give in the two scenarios, we would end up when the target is “llvm”?

S1 : metadata.c, lib.o
(this is what we would end up if we have (only) a CSourceMetadataModule)
S2 : metadata.o, lib.o (this is what we would end up if we have LLVMMetadataModule)

I would like to get to a position where if you are using topi schedules with LLVM-supported targets, TVM contains all of the dependencies you need to build object libraries.

So this is regarding the creation of lib.o (the artifacts created out of lowering the main IRModule) which is still be the same irrespective of the presence of the metadata module.

so long as you’re not generating code that deals with hardware specifics outside the ISA, you shouldn’t need to tell TVM about your target compiler where LLVM supports it.

Dont we need that for the linking the two artifacts ? Moreover, tvm involves the llvm pass manager to convert the the built LLVM IR (.ll) to object file. Will it be too much to assume the presense of at least clang (if not gcc or gcc varient for target architecture) in the machine ?

areusch · December 4, 2020, 2:18am

hi @manupa-arm, sure.

Not exactly, when we go down the metadata module approach, every IRModule that has a target “llvm” or “c” (with system-lib and runtime=c) will produce at least two runtime modules. Thus, we would need to use export_library to link the artifacts produced by saving such runtime modules.

I think this is the part i’m questioning, though I realize it is outside the scope of this RFC. however, I guess i’m thinking we should keep it in at least until we resolve that. happy to be persuaded differently though. here’s my argument:

I agree that with these changes we’re moving to a world where we are generating > 1 module. I think given the compiler architecture (i.e. independence of each codegen), which we’d like to maintain, generating a separate TVM module in each codegen makes sense. the thing we need to think about is how these modules will be consumed and the interoperability between TVM and the consumer.

When using the c++ runtime on a traditional OS, it makes sense for export_library to produce an artifact that can be loaded back again with load_module(). With the c runtime, this isn’t necessary, and lately i’ve been wondering if it makes sense to require the user to compile generated C code just to export it from TVM. I’m leaning towards no–and apologies if this is a slightly different direction than my initial implementation–but here’s why.

Currently at main, µTVM requires a compiler configuration given by a tvm.micro.compiler.Compiler subclass. The primary motivation for this was autotvm, but if the user was merely deploying a model, they have no reason to provide this configuration. Moreover, it’s quite an imposition–the compiler configuration is likely spread across some other non-python language such as a Makefile, cmake, or some IDE’s make config plus platform-specific config (I.e. SDK, RTOS, etc). It’s likely that the user has limited knowledge of this config, and it may be that the fastest way to get the set of cflags used to compile .c files in a project is just to invoke the compiler and copy them from a build log. It may be easier for the user to just bring those files into their own development flow and let the linker there do its job.

I think we should consider this output question in a separate RFC. However, at main right now, the user can choose essentially between the c and llvm backends, and I think that choice boils down to development workflow and, potentially later, optimizations we may introduce that are more robustly compiled with the LLVM backend. here’s how I would think about it as a user right now:

If using c, the user is expecting a set of source files and planning to compile them somehow with cflags, then pass it to the linker which should use ldflags. the user may need to tweak these files (I.e. sed-replace intrinsics, add debug log lines, etc) if they are less familiar with TVM and the generated code does not quite compile with their particular compiler.
If using llvm, the user may be expecting only object files and expecting to link using ldflags. they may want to ensure TVM’s generated code is interpreted correctly by the compiler so any ISA-level optimizations are implemented correctly. In this case, they may not want to have to compile an additional C file.

So mostly I think that by always producing the metadata as C, and then compiling from within TVM, we add additional burden to the user to supply the correct working C compiler. I think we could solve this either by not compiling in export_library, or by producing binary files. Since we have support already for producing binary FuncRegistry, perhaps it makes sense to keep that?

Dont we need that for the linking the two artifacts ? Moreover, tvm involves the llvm pass manager to convert the the built LLVM IR (.ll) to object file. Will it be too much to assume the presense of at least clang (if not gcc or gcc varient for target architecture) in the machine ?

Definitely we do for AutoTVM, but if AutoTVM is not what the user’s doing, then they may want to link the artifacts within their own compilation flow. I’m not familiar with the LLVM pass manager, though–so perhaps I’m arguing for a solution that still requires a local linker without realizing it. I do agree that if the user’s using LLVM, it’s likely they are targeting a fairly modern, supported ISA–but, ISA extensions which may become more popular soon may complicate this.

Let me know your thoughts–there is a lot here and we don’t have to tackle all of this now. Just putting them out there as potential motivation to leave in a pure LLVM-based flow.

Andrew

manupa-arm · December 4, 2020, 11:22pm

hi @areusch, Thanks for the detailed explanation.

I quite agree when using the “c” backend user might want to save and tweak post-TVM – that’s a good point . I think this implementation helps with that regard by producing metadata c module. Its just the micro build just need to query imports when saving them. I think we should discuss this separately and I think we should enable both type of users:

U1 : users who want such customizations

U2 : user who just expects .o with in the correct ISA.

I still think export_library could still cater this. The difference is we would need a different fcompile that exposes them (export_library is compiler agnostic). The point Im trying to make export_library could use a fcompile that just save the files in a different directory if its the debug/customization flow instead of compiling them to .o .

However, when using the “llvm” backend user anyway expects only object files. Unless, we have a good reason to have separate object files (metadata.o and lib.o), export_library could be used to produce a single object file (model.o). I ll just a have little more thought about this too and discuss internally as well.

The issue with the current FuncRegistry is that it will block us from using external modules with “llvm” target for the internal module because it constructs it just examining the PrimFuncs inside of it, unless we have a LLVMMetadataModule. If we have a good reason to have two object files (metadata.o and lib.o), I think LLVMMetadataModule is the way to go. At the minute, Im trying to convince myself whether we really need that, because llvm runtime module .save(.o as format) works by using llvm infra where 99% of the time the same system should have clang which can support the same target triple as the llvm module.

areusch · December 4, 2020, 11:39pm

@manupa-arm ah great, thanks for clarifying your thoughts as well. this makes sense why it may be complex to implement an LLVMMetadataModule. an example of a case I could think of is if someone is working with an LLVM branch that contains experimental ISA extensions, for example the RISC-V P work was done on an LLVM branch I believe. I imagine as we continue to push the boundaries of hardware acceleration we may see new instructions that may require similar compiler pinning.

additionally, it’s possible we may need LLVMMetadataModule to support linked parameters, since previous work has shown that the parsing step in the compiler frontend is the slowest part of compiling those from C source.

perhaps @tqchen has additional thoughts on these points?

manupa-arm · December 10, 2020, 3:26pm

@areusch All right!

Let me suggest a way forward then.

First, I will re-purpose the current PR to only to include the CSourceMetadata module when targeting “c” and keep the func registry in the llvm internal module.
Second, then we can look at introducing LLVMMetadata module as a next step.

Does that make sense?

areusch · December 10, 2020, 4:39pm

@manupa-arm yeah that sounds reasonable. thanks for working through this with me!

ebraheema · March 18, 2024, 8:47pm

Hi @manupa-arm @areusch

Using the Graph Executor together with CMSIS-NN with exporting to c the generated code uses the function tvmgen_default_cmsis_nn_main_0 but it is missing in the function registry. also if i use the AoT the function missing in the function registry but called under tvmgen_default___tvm_main__ function any idea what may be the issue?

ebraheema · March 18, 2024, 9:05pm

Hi @Mousius

actually i had followed your commit Initial Implementation of TIRToRuntime Target hook and found this code :

  // We don't want library modules going back into host codegen
  // unless they're supposed to. Here if we overrode the target host
  // to allow lowering previously we check that it's meant to be placed
  // back into the host Module.
  bool overrides_host_target = target->kind->device_type == target_host->kind->device_type;
  bool non_host_target_kind = target->kind != target_host->kind;
  if (overrides_host_target && non_host_target_kind) {
    device_modules.push_back(codegen::Build(mhost, it.first));
  } else {
    mhost_all->Update(mhost);
  }

seems to be somehow related to the issue i had described above. i would appreciate your help with this matter

Thanks, Ebraheem