[C/C++ runtime] multimodel support

mikeseven · December 7, 2020, 11:51pm

A bit of interruption on this discussion thanks to the awesome TVM conf last week!

On F0: named output tensors

I made some progress in getting outputs named.

the issue is first with Relay IR: functions return 1 output. If there are multiple outputs, it’s a tuple
Each returned tuple element is a tensor (DLTensor) and has no name or id
However, when a model is imported from TF, Pytorch, and so on, the outputs have names that are discarded.

So I modified the parsers to get the right mapping: name to output tuple element. And whatever from_<framework>() front-end method now returns mod, params, output_names. If you find this useful, I can make a PR, let me know.

This is sufficient for connecting a model’s outputs to another model’s inputs e.g. using a streaming framework.

But this keeps the metadata separate from the library generated by TVM. It would be “nicer” if it was embedded inside the library e.g. when we call get_output(n) in the returned DLTensor.

A possible solution: named tensors (DLTensor, NDArray)

On F0: real inputs

While TVM workflow maintains separation between function inputs and params, at compile time, they are merged as inputs. We should keep get_inputs() to return functions inputs. And add to the runtime get_params().

On F2

The current code limits the compiled function to 80 characters and replaces it with a hash string otherwise.

I don’t see any need for either. An integer id is sufficient and way smaller. For debugging purpose, a map file could be generated with much better information about what the transformations done to make a compiled function.

Thanks.