[RFC] [µTVM] Model Library Format

areusch · February 12, 2021, 3:07am

Model Library Format

Background

TVM’s build process for imported models centers around tvm.relay.build, a function which produces a 3-tuple (graph_json, lib, params). The inference workflow then diverges depending on how the user wants to use the compiled artifacts:

If the build targets the c++ runtime and uses the llvm backend…
- and the user wants to run in the same Python instance used to compile: the user can directly instantiate a GraphRuntime instance.
- and the user wants to transfer the model to another Python runtime instance without cross-compiling: the user can call lib.export_library(), and store graph_json and params in some ad-hoc way. Then, tvm.runtime.load_module() can recreate lib in the new runtime instance.
- and the user wants to transfer the model to another Python runtime instance with cross-compiling: the same procedure as above, but pass fcompile to export_library to specify the cross-compiler.
If the build targets the c++ runtime and uses the c backend…
- and the user wants to run the model with Python on similar architecture: the user must compile the produced c files to produce an artifact similar to the one produced by lib.export_library(). Then, they can load and run the library following the procedure above. When saving and loading from the same instance (so graph_json and params are not a consideration), this process is handled invisibly by loadfile_tar.
- and the user wants to run the model with Python on different architecture: same procedure as above, but with a cross-compiler.
- and the user wants to run the model with a different frontend language: same procedure as above, but the user must translate graph_json and params to a format suitable for the other language
If the build targets the c runtime…
- and the user wants to run the model with TVM in Python: not supported — Python supports C++ runtime only.
- and the user wants to run standalone: compile with -system-lib, store the library in a .tar with export_library(), store params and graph_json to disk in an ad-hoc way, unpack the tar and integrate all pieces into a standalone project. A small main is needed to launch the C runtime, load the model and parameters, and run inference. See apps/bundle_deploy.

In all cases except the first (compile and run in the same TVM instance), the user needs to serialize the tvm.relay.build 3-tuple before doing anything else. However, TVM provides no common function to handle this—it only directly handles serializing the compiled library. The user is left to store the parameters and runtime configuration (e.g. graph_json) in a way that suits the task at hand. This discrepancy means that all the automation that consumes TVM artifacts from disk is always hand-written and specific to the situation.

On microTVM, we are preparing to introduce a Project-level API, implementations of which a) live in separate codebases from tvm and b) build firmware images from the tvm.relay.build artifacts. Due to this burden, the API needs to specify how all artifacts from tvm.relay.build are placed on-disk.

To prepare for this API, we propose Model Library Format, a standard on-disk format for microTVM artifacts. microTVM primarily expects users to use the c or llvm backends with a cross-compiler, and build results may contain BYOC artifacts as well. As a secondary goal to this RFC, we make some considerations such that Model Library Format could be re-used as the standard on-disk format produced by tvmc.

Goals

Describe a standard way to serialize microTVM artifacts for use in downstream automation to compile them into firmware
Describe how to implement a load API such as tvm.runtime.load_module() -> GraphRuntimeFactory.
Make considerations to accommodate other runtimes such as AOT and VM.

Non-Goals

Immediately change the tvmc output format to Model Library Format for non-µTVM uses. The initial implementation is focused to microTVM only.
Decide how to serialize compilation flows unrelated to microTVM

Model Library Format

Model Library Format is a tar-archived directory tree. A sketch is as follows:

/
 README.md - A short standardized README for new users plus human-readable metadata.json
 metadata-<n>.json - Overall metadata describing this artifact; version <n>
 crt/ - The content of standalone_crt from TVM build/
  Makefile
  include/
   ...
  src/
   ...
 codegen/ - Stores generated libraries in source or binary form
  host/ - Generated code for target_host
   lib/ - Generated binary object files
    aot.o - Future home of AOT runtime generated code
    devc.o - C++ MetadataModule artifact, unused in µTVM. Should get deleted.
    lib0.o - LLVM module
    lib1.o - LLVM CRT Metadata Module
   src/ - Generated C source
    devc.c - C++ MetadataModule artifact, unused in µTVM. Should get deleted.
    lib0.c - C module
    lib1.c - C CRT Metadata module
  target_key/ - Additional directories for code which should get compiled for use on a target.
 parameters/ - Stores simplified parameters
  <model_name>.bson - BSON-serialized runtime parameters (optional)
  <model_name>.params - tvm.relay._save_params format (always present)
  <model_name>.json - JSON-serialized parameters (optional)
 relay.txt - text representation of the relay model compiled, if built from Relay
 runtime-config/ - Stores runtime configuration.
  aot/ - AOT runtime config
   (tbd)
  graph/ - Graph runtime config 
   graph.json - Graph runtime JSON

metadata.json

The metadata file contains machine-parseable data describing the build. It also contains model-level information that is easier (right now) to parse as a single JSON document rather than split into many smaller purpose-specific files.

Following is a proposed schema:

{
    "version": 1,  // version of this document.
    "model_name": "<model_name>",  // model name, (passed as mod_name= to tvm.relay.build).
    "export_datetime_utc": "%Y-%m-%d %H:%M:%SZ"  // Time of export, in UTC.
    "memory": {},  // configured memory map (see Memory Map)
    "target": "",  // TVM target string used to compile this artifact
    "runtimes": ["graph"],  // The runtimes that can launch this model.
}

Memory Map

In v1, the Memory Map will describe the buffers allocated by the GraphRuntime. As the memory planner is improved, this data structure will be expanded. Following is the schema for the “memory” key in v1:

[
    {
        "storage_id": <n>,  // storage_id of the buffer, allocated by GraphRuntime
        "size_bytes": <n>,  // size of this buffer, in bytes
        "input_binding": ""  // when bound to a model input, the name of that input
    },
    // Additional entries
]

Building a Model Library Format

Here is the process by which TVM creates a Model Library Format from [tvm.relay.build](http://tvm.relay.build) artifact. Here, graph_json, lib, and params are the 3-tuple returned and target is the TVM target. mkdir is assumed.

If target contains --runtime=crt, copy $tvm_root/build/standalone_crt to ./crt.
Populate ./codegen by calling lib.export_library(), which should:
1. Collect all Modules that execute on the host and pass to fcompile. At present, these are those with type_key() of c or llvm. When the c target is used, fcompile should copy the generated files into ./codegen/host/src instead of generating a .tar.
2. (TODO, but not as a result of this RFC) Group the non-host modules by target_type (except that ext_dev target_types should be expanded to a unique key per BYOC). Save each generated module into a file underneath ./codegen/<target_type>.
Populate ./parameters .
- Produce <model_name>.params with tvm.relay._save_params.
- Produce <model_name>.json with TBD (there doesn’t seem to be a standard in TVM, so I guess we’ll have to propose one)
Produce relay.txt with IRModule.get_source
Produce ./runtime-config as follows:
- for GraphRuntime: save graph.json to ./runtime-config/graph/graph.json
- for VM: TBD
- for AOT: TBD
Produce metadata-<n>.json by building the required data structure and serializing to JSON.

Finally, the entire directory tree should be packaged into a TAR file with .model-lib extension for easy transmission.

Implementation in TVM

The implementation of this RFC will initially consist of the following:

Adding a new function, tvm.runtime.Module#export_model_library_format. This function implements the above procedure for runtimes which use the c backend.
Placing the state necessary to implement export_model_library_format into GraphRuntimeCodegenModule, and making it accessible from Python.
Adding loadfile_model_lib which allows loading tvm.runtime.GraphRuntimeFactoryModule from the file produced by export_model_library_format.
Adding unit tests and changing apps/bundle_deploy to use this format as an example.

Following implementation of this RFC, another RFC (Project-level API for µTVM projects) will be submitted explaining how we intend to refactor the current interaction between TVM and µTVM runtime projects to allow for better portability. Also, tvmc will begin creating Model Library Format for --runtime=c targets.

µTVM Use Cases

Here I briefly walk through some µTVM use cases of Model Library Format to consider whether it’s a net improvement.

Building Host-Driven Firmware (µTVM)

At present, µTVM builds host-driven firmware (GraphRuntime instantiated on the host) as follows:

The user instantiates an implementation of tvm.micro.Compiler.
TVM invokes tvm.micro.Compiler#library to compile each CRT sub-library and the code in ./codegen/host.
TVM invokes tvm.micro.Compiler#binary to build a binary firmware image including each library.

Following implementation of this change, the compilation flow will remain the same, but the CRT sources used will be taken from the Model Library Format tree.

Host-Driven Inference

At present, this is done from within the same Python script as called [tvm.relay.build](http://tvm.relay.build) since it’s easier to keep all of the state in memory. This can be done with a separate python invocation, but there is no standard function to load all of the state necessary, so it’s ad-hoc. Following this change, the GraphRuntimeFactoryModule can be loaded using tvm.runtime.load_module, so it will be much easier to reconstruct the state needed for host-driven inference.

Building Standalone Firmware (e.g. `apps/bundle_deploy`)

Currently, apps/bundle_deploy invokes a custom Python script which produces artifacts in apps/bundle_deploy/build. After this RFC, apps/bundle_deploy/build_model.py will produce Model Library Format artifacts for the C-runtime compatible artifacts.

For apps/bundle_deploy, the Makefile will be updated to reference the artifacts in standard locations. In the future, it will be possible to write a standard script to ingest generated code as a library into project build systems.

Future Work

We expect to make changes as future considerations are made in Model Library Format. Each time a change is made, the version number will be incremented. Here are some sketches of future topics that could be tackled.

Contexts

In heterogeneous execution, this object will describe the various DLContexts that TVM expects to be configured on the device. This RFC doesn’t seek to fully describe this key—heterogeneous execution is a future goal of µTVM, and until something more concrete is proposed there, this key will just contain an entry for DLContext(kDLCPU, 0).

Here is a strawman:

    "contexts": [
        {
              "device_type": "cpu",
              "device_id": 0,
        },
        {
              "device_type": "ext_dev",
              "device_id": 0,
              "compiler": "accel_compiler_key",
              "config": {
                    // device-specific config, populated by BYOC
              },
        },
    ],  // configured DLContext (see DLContext configuration)

Models Targeted to the C++ Runtime

Models targeted to the C++ runtime have very similar structure to those targeted at the C runtime. The main difference is in how non-c and non-llvm (“non-DSO-Exportable”) modules are packaged.

The C+±runtime places all modules in a single shared library like a “fat binary.” At load time, it expects to find a constant __tvm_dev_mblob which contains concatenated Module#save from all of these modules. It then invokes a runtime.module.loadbinary_<type_key> for each Module in __tvm_dev_mblob.

In the C runtime, non-DSO-Exportable modules are typically created from BYOC flows and are meant to be executed by accelerators. Because RAM is typically quite precious on µC, the C runtime intends to make such generated BYOC code available to the downstream firmware build at compile time. Modules are grouped by target_type one file is generated per Module containing Module#save .

It’s possible that both approaches could be taken for C++ runtime to allow pre-compilation of Modules. However, the simplest and most likely way to move forward would be to create ./codegen/<model_name>.so and avoid creating subdirectories. When the c backend is used with the C++ runtime, ./codegen/host/src could still be created, or the .tar could be placed in ./codegen/<model_name.tar>.

@tqchen @gromero @leandron @manupa-arm @mdw-octoml @jroesch @mjs @liangfu

manupa-arm · February 15, 2021, 7:52pm

Hi @areusch ,

Thanks for taking time to put this all up. Overall it makes sense to me.

(TODO, but not as a result of this RFC) Group the non-host modules by target_type (except that ext_dev target_types should be expanded to a unique key per BYOC). Save each generated module into a file underneath ./codegen/<target_type>.

Yes, currently for uTVM BYOC codegen, we are currently working on is producing a DSOExportable (we might need to re-think of this terminology as well ) module of type “c”. One of the things missing is a way to say the codegen name in the runtime.Module without overloading type_key. So this is known as “compiler” attribute in the external function in the world of Relay before calling the BYOC compilation that converts Relay IRModule → runtime.Module.

However, a draw-back of this approach is Project-API that consume this will need to be aware of the folder structure based on composite target. Do we need that complexity ? If not, I would suggest we use a single flat hierarchy to store the sources ? if we need more clarity, we can be explicit with the naming. E.g. : host_module.c metadata.c accel_0.c. WDYT?

areusch · February 15, 2021, 8:41pm

One of the things missing is a way to say the codegen name in the runtime.Module without overloading type_key.

I agree. I don’t know exactly the best way to proceed here–we likely need another RFC. The limitation is that with C++ runtime, the external compiler in BYOC is expected to produce runnable artifacts. In µTVM, we just expect compilable artifacts. This is a whole different can of worms: it’s likely we’ll produce a bunch of similarly-formatted runtime.Module (i.e. .o, .c) which may nonetheless need special treatment downstream in the Project generator. Therefore I think it makes sense to identify them somehow to do the export grouping. However, I don’t know whether it would make sense to add another type_key-like thing or just stick with type_key and provide extensible Module impls in C++ land. It would be helpful to think through a case where a specialized compiler is required downstream (I.e. something like a CUDA compiler) which produces object code not targeted to target_host.

However, a draw-back of this approach is Project-API that consume this will need to be aware of the folder structure based on composite target. Do we need that complexity ?

The case I mention at the end of the above is the case I’m thinking of where it would be helpful to separate things. The C++ BYOC interface is: “produce a runnable Module in external compiler and serialize/deserialize through binary blob.” I think given we don’t expect external compilers to produce code that compiles on target_host, we should create an explicit namespace that may not include c code here.

manupa-arm · February 16, 2021, 6:08am

So generally BYOC caters to two types of use cases that are mostly to handle accelerators and optimized operator libraries (e.g., Arm Compute Library, DNNL). I think in the world of micro, both of these should be invoked in the target_host via Driver/Runtime API component. i.e., even though there are object format that target_host could not execute, they might not be able to run on its own and would rely on the target_host to run some Driver/Runtime API to initialize and execute. I believe this bit of the code could be “c” or “o” with artifacts builtin into the runtime.Module.

I think only exception for this usecase is where we would have an packed function directly stored in an accelerator memory space that can support execution of standard procedure calls in its own binary format. Do you have a such a usecase in mind ? Nonetheless, even in that case I think the categorization should be done on artifact format rather than the codegen – therefore we might as well use type_key for the folder structure?

areusch · February 16, 2021, 4:55pm

I would sort of like to keep the sort-by-target structure proposed. I don’t think it affects anything for the time being, and it would be a good discussion to have as to how to identify runtime.Module. They are fairly opaque now.

I think the part that’s unanswered for me is whether anyone would want to generate configuration from within the BYOC compiler step and then generate actual device code after exporting to Model Library Format. The advantage of this is that the output from the TVM memory planner is available after export. I think it largely depends on what the memory planner can offer, but we haven’t settled that yet, and even so it may change over time or differ per-target.

I think only exception for this usecase is where we would have an packed function directly stored in an accelerator memory space that can support execution of standard procedure calls in its own binary format. Do you have a such a usecase in mind ?

I’m not so much thinking about this as that if generating some type of code to drive an accelerator, addresses of pinned tensors may be a part of such code. It’s possible that an accelerator compiler may even be a separate invocation of gcc with different flags, linker script, etc. It may in general be easier to drive this from the Project rather than from a BYOC callback.

I think in the world of micro, both of these should be invoked in the target_host via Driver/Runtime API component.

I agree. I think BYOC should expect to generate PackedFunc that invoked from the runtime to launch the compute. The context will be likely not CPU e.g. DLContext{ext_dev, 0}. In the future, we need to make some consideration for concurrent execution, but that would likely come in the form of some kind of continuation PF and not affect the way things are launched.

gromero · February 19, 2021, 12:54am

Hello Andrew,

Thanks a lot for writing that proposal. I think it is crucial for the tvmc cli and excellent for TVM overall.

I have some comments about the proposed specification and about the RFC text. I believe the comments on the RFC text might help some readers to understand a bit more some aspects of the proposed specification at the same time they will serve as clarifications for me.

On the proposed specification:

Don’t we need a way to specify the project API used - when it applies - (like Zephyr) so tvmc can find the correct API to be used to decode data related the project type? If so, how about putting it into metatada.json?
I think entire directory tree should be packaged into a TAR file with .model-lib extension for easy transmission.

Shouldn’t we specify a fixed filename for the Model Library Format? Having a fixed pattern will help for instance tvmc to promptly find it in a given project dir. The filename could be model-lib.tar, keeping the .tar extension as it’s indeed a TAR file and nothing else : )

In metadata-.json, how about keeping the version in the first key-value pair to avoid the “-n” in the filename? After reading the ‘version’ key one can determine how to interpret the rest of the data in the file.
I don’t see how the level codegen in the Model Library Format helps. How about removing it, plus using target instead of host (I think host causes confusion, since we have a use-case that builds and runs the model on the same machine (or host?, i.e. host == target?) and another that does it on different machines (host != target), Can we say that the host is where the Python API is used and the target where the Graph operations run? If so, I think target is better. Then rename “target_key” to “keys” and move it under “target” level. .
Finally, in section Models Targeted to the C++ Runtime? I’m not sure I’m following the problem to be solved there Could you please clarify it to me?

On the RFC text:

TVM’s build process for imported models centers around tvm.relay.build , a function which produces a 3-tuple (graph_json, lib, params)

Maybe put some annotation about the data types, like: a 3-tuple (graph_json: str, lib: tvm.runtime.Module, params: dict). Most important one being the tvm.runtime.Module, to be discussed a lot further

and the user wants to run in the same Python instance used to compile: the user can directly instantiate a GraphRuntime instance.

I’m wondering if you meant tvm.contrib.graph_runtime.GraphModule instead of GraphRuntime

and the user wants to transfer the model to another Python runtime instance without cross-compiling: the user can call lib.export_library(), and store graph_json and params in some ad-hoc way. Then, tvm.runtime.load_module() can recreate lib in the new runtime instance.

Maybe adding something like in bold text:

“and the user wants to transfer the model to another Python runtime instance without cross-compiling: the user can call lib.export_library() to export lib as a shared object (.so), and then store graph_json and params in some ad-hoc way. Then, tvm.runtime.load_module() can recreate lib from the exported shared object in the new runtime instance”

and the user wants to run the model with Python on similar architecture: the user must compile the produced c files to produce an artifact similar to the one produced by lib.export_library(). Then, they can load and run the library following the procedure above. When saving and loading from the same instance (so graph_json and params are not a consideration), this process is handled invisibly by loadfile_tar.

I could not find loadfile_tar function. It seems you’re referring to tvm.runtime.load_module(), which takes care of .tar files similarly to what is described?

Also, maybe adding some text like in bold to stress the difference how export_library behaves for c and llvm targets:

“and the user wants to run the model with Python on similar architecture: because the c backend is used, lib.export_library() will export the sources files to a tar file, hence users must compile the produced c files to produce an artifact similar to the one produced by lib.export_library() when the llvm backend is selected. Then, they can load and run the library following the same procedure above, using tvm.runtime.load_module(). When saving and loading from the same instance (so graph_json and params are not a consideration), this process is handled invisibly by tvm.runtime.load_module(), which will load the source from the exported tar file, will build it using the default toolchain (g++) available on the architecture, and load the generated shared object (.so).”

and the user wants to run the model with Python on different architecture: same procedure as above, but with a cross-compiler.

It looks to me that this case is the same as when the llvm backend is used on different archs. If that’s true, maybe change it to something like in the bold text:

" * and the user wants to run the model with Python on different architectures: same procedure as the llvm backend, i.e. export the library using lib.export_library() specifying a cross-compiler via fcompile parameter, then use tvm.runtime.load_module() in the new runtime instance to recreate lib."

To prepare for this API, we propose Model Library Format, a standard on-disk format for microTVM artifacts. microTVM primarily expects users to use the c or llvm backends with a cross-compiler, and build results may contain BYOC artifacts as well. As a secondary goal to this RFC, we make some considerations such that Model Library Format could be re-used as the standard on-disk format produced by tvmc .

Maybe just adding more context for ppl not so familiar with microTVM and tvmc…:

"To prepare for this API, we propose Model Library Format, a standard on-disk format for microTVM artifacts. microTVM primarily expects users to use the c or llvm backends with a cross-compiler since the target architecture likely will be different from the host where the Python used to compile the artifacts runs, given that microTVM targets MCUs, so build results may contain BYOC artifacts as well (for example, like the ones related to a firmware image). Thus as a secondary goal to this RFC, we make some considerations such that Model Library Format could be re-used as the standard on-disk format produced and used by tvmc, specially because the Model Library Format here proposed will help the coordination between different stages of the workflow(s) supported by tvmc, like the compile, flash, run, and autotune stages."

Produce relay.txt with IRModule.get_source

I’m wondering if you meant s/IRModule/HybridModule/? I’m don’t know where get_source gets available in IRModule.

for GraphRuntime (tvm.contrib.graph_runtime.GraphModule?): save graph.json to ./runtime-config/graph/graph.json

s/ GraphRuntime/tvm.contrib.graph_runtime.GraphModule/? (as mentioned previously, above)

Finally, I see some occurrences of http://tvm.relay.build, I think the http:// part is an autocomplete issue?

Cheers, Gustavo

areusch · February 19, 2021, 6:36pm

thanks @gromero for looking this over! here are replies to your questions. on the text edits, they all seem fine–if we move this to an RFCs repo I’ll make them.

Don’t we need a way to specify the project API used - when it applies - (like Zephyr) so tvmc can find the correct API to be used to decode data related the project type? If so, how about putting it into metatada.json ?

I’m not sure. metadata.json would be the place for that–I don’t know that the Project API needs to be in the export format. I think you could specify that later on (surely the location of the template project would change system-to-system, and we want to be able to send Model Library Format to other systems). Since we are implementing this first, I’ll leave it off the initial impl and we can add it when we merge Project API.

Shouldn’t we specify a fixed filename for the Model Library Format? Having a fixed pattern will help for instance tvmc to promptly find it in a given project dir. The filename could be model-lib.tar , keeping the .tar extension as it’s indeed a TAR file and nothing else : )

I think the idea would be that Model Library Format is not necessarily placed into a project dir to start. For the Project API, I was envisioning

    template Project dir
              +           -------> Generated model-specific project dir
    Model Library Format

and the API impl (some script in the root of the project dir) would get copied to the generated project dir and used for further operations. But that’s just initial thoughts.

In metadata-.json, how about keeping the version in the first key-value pair to avoid the “-n” in the filename? After reading the ‘version’ key one can determine how to interpret the rest of the data in the file.

Actually this is a fair point–it’s JSON so it’s going to parse anyway. I think we could do that.

I don’t see how the level codegen in the Model Library Format helps. How about removing it, plus using target instead of host (I think host causes confusion, since we have a use-case that builds and runs the model on the same machine (or host?, i.e. host == target?) and another that does it on different machines (host != target), Can we say that the host is where the Python API is used and the target where the Graph operations run? If so, I think target is better. Then rename “target_key” to “keys” and move it under “target” level. .

It doesn’t exactly, but I think it helps to organize all the generated code into one place. The tricky thing is–when we move to heterogenous execution, we’ll need to identify the other targets. We are actually missing an “id” for that in our current compiler data model–here I just call it target_key. I’m writing yet another RFC about that now. Once we have such an id, it would become the names of those other codegen/ directories, so I think having a single folder to assign meanings to the subdirectory names is better than sprawling them over the artifact’s top-level.

I’m wondering if you meant tvm.contrib.graph_runtime.GraphModule instead of GraphRuntime

we tend to use these interchangably, but yeah

I’m wondering if you meant s/IRModule /HybridModule /? I’m don’t know where get_source gets available in IRModule .

ugh, I mean str(IRModule) or PrettyPrint. I get confused between our ways of getting source code

Finally, I see some occurrences of http://tvm.relay.build , I think the http:// part is an autocomplete issue?

yeah sorry copy-paste issue. will remove should this get committed to TVM RFCs repo.

gromero · February 19, 2021, 7:46pm

Hi Andrew,

Thanks for the clarifications, specially about GraphRuntime and str(IRModule)!

Shouldn’t we specify a fixed filename for the Model Library Format? Having a fixed pattern will help for instance tvmc to promptly find it in a given project dir. The filename could be model-lib.tar , keeping the .tar extension as it’s indeed a TAR file and nothing else : )

I think the idea would be that Model Library Format is not necessarily placed into a project dir to start. For the Project API, I was envisioning (figure) and the API impl (some script in the root of the project dir) would get copied to the generated project dir and used for further operations. But that’s just initial thoughts.

Right, the Model Library Format would not be placed into the project dir to start. But I’m thinking of the following case: one executes a compile command to start - passing the project type and dir (that will then use the mechanism you envisioned - copy template project dir with the scripts to the project dir and make a project-specific compile, build, flash, and run command available + then generate and copy resulting compiled artifacts in the compile stage to a new Model Library Format file). Then, after compile, for whatever reason, one wants to execute a micro build, micro flash, or run on a different system. Isn’t it necessary for these commands to find out the project API type, for instance, to be able to proceed with, let’s say, micro build, so the correct template can be regenerated (or used) and used by run subsequent commands? I was thinking that the idea was to use the Model Library Format also to keep that information to find the correct state to continue to the next stages on tvmc workflow. Additional, in that same scenario, it would be easy to have a fixed name (like model-lib.tar) that would be searched automatically and read by stages (like run) to avoid reentering flags already known (and passed) in the previous stages.

Rethinking I do agree that the Model Library Format file doesn’t need to be generated in the project dir (actually it seems better to generate it in the folder where tvmc runs?), however I’m still concerned how / where we’re going to keep the states necessary to move on to next stage in the tvmc workflow in case the stages run on different systems.

The scenario that one runs compile + micro build in one system, micro flash in another system, and finally run on yet another system doesn’t look too exotic to me.

Cheers, Gustavo

areusch · February 19, 2021, 9:54pm

Isn’t it necessary for these commands to find out the project API type, for instance, to be able to proceed with, let’s say, micro build , so the correct template can be regenerated (or used) and used by run subsequent commands?

Ah–I was thinking this would come from the Project API implementation (copied from template project dir to generated project dir) rather than the Model Library Format itself. I do think we should store enough information sourced from the TVM compiler to drive micro build, micro flash, micro run.

Rethinking I do agree that the Model Library Format file doesn’t need to be generated in the project dir (actually it seems better to generate it in the folder where tvmc runs?), however I’m still concerned how / where we’re going to keep the states necessary to move on to next stage in the tvmc workflow in case the stages run on different systems.

I think that once micro build is invoked, the artifact is immobile–too many of these build tools encode absolute paths for things to end well if you move generated project dir.

The scenario that one runs compile + micro build in one system, micro flash in another system, and finally run on yet another system doesn’t look too exotic to me.

I don’t think it’s unreasonable to want to do this with a firmware image, but it’s not clear that all build tools would allow this. I don’t know that we need to support this use case across all firmware platforms–but those that do support it could certainly do that. This sort of means that AutoTVM would presume that once a project is generated, it’s not moved. That implies that AutoTVM firmware builds are executed on the runner node.

areusch · February 23, 2021, 11:07pm

I created a PoC here: GitHub - areusch/incubator-tvm at model-library-format

In particular, see the example Model Library Format tree, which is ordinarily inside a .tar file.

gromero · February 25, 2021, 9:43pm

Hi Andrew!

Thanks for the PoC, I’ll have a look.

Regarding your last reply, I think that for the moment it’s totally ok to have the Project API take care of the stages. The main question I believe is about the immobile artifacts, so my concern was how to recover the correct workflow state when the archive moves. However, as you said, it’s unclear that all build tools would be suitable to generate a movable archive allowing tvmc stages to run the stages on separate machines. This is something I want to do, but for now I think it’s ok to assume that constraint.

I have no further comments on the RFC.

Cheers

[RFC] [µTVM] Model Library Format

Model Library Format

Background

Goals

Non-Goals

Model Library Format

metadata.json

Memory Map

Building a Model Library Format

Implementation in TVM

µTVM Use Cases

Building Host-Driven Firmware (µTVM)

Host-Driven Inference

Building Standalone Firmware (e.g. apps/bundle_deploy)

Future Work

Contexts

Models Targeted to the C++ Runtime

Building Standalone Firmware (e.g. `apps/bundle_deploy`)