External modules in uTVM

manupa-arm · September 25, 2020, 3:47pm

I’ve seen that current uTVM flow uses module.save(…) interface to save the “c” module prior to compiling it to the micro binary that is fed to the runtime. I was wondering if we use the BYOC on the flow and create our own external c-source module/function additionally to the single dso_module(c-source) created today, how would that fit in the flow ?

I assume that it could be another c source that could get linked in the same way ? Highly appreciate any ideas on it.

Also, if so I was wondering what would be a better way to store the constant binary artifacts produced as part of the lowering of the external function, that are needed in the runtime ? (someway better than having them in source)

cc: @areusch @weberlo @tqchen @grant-arm

areusch · September 25, 2020, 11:40pm

hi @manupa-arm,

seems like we have two options:

O1. build a library version of this StaticRuntime fcompile function and make export_library call this function to create something like a DSO (instead, it would likely be a static archive .a, but similar idea). the advantage is that the micro flow would retain the same API as the usual export_library flow, although even now the implementation is quite different when exporting a source module. the disadvantage is that we still need to pass a custom fcompile function to allow the user to specify a cross compiler, which is maybe confusing.

O2. build a micro-specific export_library implementation that knows how to traverse imported modules and compile as necessary. the advantage is that we don’t need to go far out of our way to accommodate micro-specific customizations in the library flow, so the user API may be more clear and easier to grow around any future changes we need for BYOC/micro. The disadvantage is that we may duplicate some logic.

it probably also depends somewhat whether or not your BYOC will generate C source or object code (it should be possible now to generate µTVM operators using the llvm backend, but we haven’t tried it yet).

for the constant artifacts, you should be able to produce a binary .o that’s linked the same way as any BYOC-generated code. it seems like we should modify build_static_runtime to learn how to build these companion artifacts, or otherwise move this into an e.g. build_micro_library function.

thoughts? @tqchen

manupa-arm · September 28, 2020, 3:31pm

hi @areusch,

Thanks for the reply!

We dont have a strict requirement per se to generate a c-source module. In fact, even if we did it would be the same source – just binary artifacts being different. That sounds to me like we could go with O1 ?

Anyhow, micro-specific or not, why do you think passing the fcompile function to the export library is not a good idea ? (Im assuming you are thinking non-uTVM use case). I think its better to provide this capability to the user either way while having a default to what it is in the export_library interface.

I think we should be able to serialize the binary artifacts using the SaveToBinary interface. Therefore, the archive creation would encapsulate everything. Please let me know if Im missing something that blocks this kind of behavior in uTVM.

cc : @comaniac

tqchen · September 28, 2020, 3:38pm

O1 is the most ideal, since we want to have a single export_library function for all backends( with linking behavior customed via fcompile)

areusch · September 28, 2020, 7:31pm

@manupa-arm ah I don’t necessarily think use of fcompile is a bad idea, but for µTVM then that does mean that you must pass fcompile, so we just need to make sure the API is easy/obvious enough to use (or build another API on top of this).

re: the SaveToBinary: I agree that would be a convenient way of bundling stuff. my concern there is that currently the µTVM flow produces artifacts that can just be passed directly to a compiler (if .c file) or linker (if .o). this then allows users to easily implement a custom tvm.micro.Compiler instance. if we are producing a custom binary format, it may be a bit trickier to do that.

if producing a library containing C source + accompanying .o , we may need to see how best to export that as two files from export_library so that ordinary build toolchains can consume them. we might be able to leverage the tvm.micro.MicroLibrary class.

manupa-arm · September 29, 2020, 2:13pm

@areusch looking at the design of export_library, it seems it is designed to generate a shared object. Thus, the difference in uTVM (w.r.t. TVM) would be that we would want to statically link it with the runtime in the compile time itself. What are your thoughts of re-using the export_library to produce the static archive (.a) for uTVM builds ? (by providing the fcompile function to the export library)

Moreover, my understanding of the tvm.micro.Compiler instance is providing an abstract class to compile library (using library method) and use that (along with external libs) to re-compile and link with the runtime (using binary method). Therefore, I was thinking whether we can re-use export library to create the archive for the libary part ? – It would just need the fcompile function which could be provided by the user and could be an attribute in the tvm.micro.Compiler class. In this way, the export_library already knows how to deal with import_trees of modules.

We are inclined to go for producing c-source module approach for the external module but for a different reason. However, if we can use the export_library (via using micro-friendly fcompile function), it will create this additional source through PackImportsToC which deals with serialization format. Essentially, the additional source (devc.cc) will have binary blobs that serialized via SaveToBinary interface and I believe the runtime module will also know how to reconstruct itself if the runtime module is linked into the runtime.

Having said that, the reason we might want to go with c-source module would be that it would be easier for the fcompile function to put the constants in the c-source module to the flash. If not, if we are to re-construct the runtime module via LoadFromBinary interface, it will do so in the stack/heap forcing a mandatory copy from the flash to volatile memory, I suppose.

tqchen · September 29, 2020, 3:06pm

I agree that putting weight as constant would be an import question. This is something that is probably orthogonal to the C source module, as we might be able to create a similar util via LLVM(like what we did in the PackImports)

manupa-arm · September 29, 2020, 4:35pm

@tqchen, I think we should handle the weights more generally. Here I was referring to binary artifacts that are produced (which are not present in the relay graph initially as weights do) as part of the lowering of the external function that is required in the runtime.

areusch · September 29, 2020, 4:42pm

@manupa-arm yeah exactly–the main difference is that µTVM wants a static library by default. i’m okay with O1 (reusing export_library) so long as we don’t need to change export_library too much to accommodate µTVM (i don’t believe any changes are needed, after reviewing it here).

for my autotvm prototype using µTVM RPC server, I made such an fcompile that calls tvm.micro.Compiler.library(): https://github.com/areusch/incubator-tvm/blob/utvm-runtime/python/tvm/micro/build.py#L217

i’ll merge that next after the QEMU regression is finished or potentially concurrently. feel free to hack on it if it can be made useful for this situation.

re: PackImportsToC: I’m not sure this is exactly the intended use of export_library, but it seems like if you had a CSourceModule that imported a single LLVM module (containing the binary blobs as const global symbols), you could write an fcompile that accepted both .c and .o and produced a static library .a just as the autotvm does now. that seems fine from a µTVM perspective. the advantage would be that it would get around the slow compiler parse times that people have reported. we can also create an e.g. PackConstantsToLLVM to specialize for binary blobs with individual global symbol names.

I think so long as the binary blobs are const primitive-typed symbols, most linker scripts should place them in flash, correct? we should definitely make sure we are generating blobs that stay in flash, so we don’t blow up the pre-main() flash-to-ram loop.

manupa-arm · September 29, 2020, 4:57pm

Sounds good!. I ll take a look at your fork for now and see what we can do.

Regarding PackConstantsToLLVM, I think this is the intention behind the design of the metadata module (cc : @comaniac @zhiics). I believe a solution lies where we could generally support it rather than using it to cater to external modules only. However, we would need to re-think about LoadFromBinary interface because as of today it will re-construct the weights in the volatile memory reading the constant data. If we can make it pass pointers to data located in the flash and re-interpret them, then it would be able to solve this. Not too sure about a solution at the minute, thus welcome any ideas

areusch · September 29, 2020, 10:32pm

that sounds pretty reasonable to me. I need to read more about the metadata encoding, but it seems like we should avoid copying data out of flash.

tqchen · September 29, 2020, 10:36pm

Yes, we should build a solution that directly bake the weight into the rodata section without having to decode from a meta-data. I think we have a good path to make it working.

manupa-arm · November 23, 2020, 5:58pm

Hi All,

I had a go at it implementing the some of the discussed changes : https://github.com/apache/incubator-tvm/pull/6950.