[DISCUSS] Module based Model Runtime Interface

tqchen · February 26, 2020, 2:44am

A github is good given that we are finalizing it

FrozenGene · February 26, 2020, 4:56am

tqchen:

First of all, if we are going to export the parameters with the DSO, whether or not should we offer the quick option to compress them in the binary(e.g. zlib the parameters before serialization, or at least offer it as a form of storage format.

We should have a simple way to allow user to specify not to include the parameters in the DSO, it would be interesting to ask what that API should look like

For example, here is one possibility. It might also worth ask everyone’s opinion about the API and thedefault option
mod.export_library("xx.so", package_params=False)

FrozenGene:

I prefer create one helper method in the
class GraphRuntimeFactoryModule(Module)
We could name it
def PackageParams(value -> boolean)
which will set the is_package_params_ in the C++ class GraphRuntimeFactory , when we do SaveToBinary , we could according to is_package_params_ to judge packing params or not.

What we do for exporting is the same mod.export_library("xx.so") , but if you want to package params or not, you should call PackageParams(True) or PackageParams(False) .

Of course, we could discuss the default behavior of PackageParams .

When I read our post and try to summary, I find that we have this not to be decided. As previous post said, we don’t like GraphRuntimeFactoryModule. However, we need to decide the default value of packing parameters.

tqchen · February 26, 2020, 5:00am

Hmm, would love to get people’s thoughts about param packaging. I think it is OK it leave it unpackaged(since it is the previous behavior) then we decide later if we want to turn it on by default

FrozenGene · February 26, 2020, 6:13am

Let us keep it one day to hear thoughts of whether we should packing the weights by default. @haichen @jwfromm

FrozenGene · February 26, 2020, 6:24am

BTW, if we don’t package weights, when we deploy , we will need two things : deploy.so / deploy.params. However, our API will encourage users : factorymod = relay.build(...) and when we call __iter__ we will warn, seems it breaks our constraint here.

tqchen · February 26, 2020, 6:48am

You are right, let us package by default then. In most cases seems package params is the right approach

haichen · February 27, 2020, 12:02am

I agree package params by default is good since it’s likely that TVM transforms the params during the optimization

liangfu · February 27, 2020, 8:58am

As a side note, for large models like VGG19, it might require a long time to compile the params into shared library.

FrozenGene · February 27, 2020, 9:43am

This problem should be fixed by my this pr: https://github.com/apache/incubator-tvm/pull/4657

FrozenGene · March 5, 2020, 7:53am

Sorry for the delay because of other things. I will file a formal RFC on GitHub this weekend (or early next week) based on our discussion.

FrozenGene · March 11, 2020, 3:43am

GitHub RFC: https://github.com/apache/incubator-tvm/issues/5038

Wish I don’t miss some part of so significant discussions we have completed.

windclarion · April 13, 2020, 2:24am

const unsigned char __tvm_dev_mblob[46788038] = {“TVM_BLOB_SIG”}; maybe not enough. because 46788038 is too big for many embedded system, so I have to place __tvm_dev_mblob to special section, for example, a rodata section. so I mean I need declare __tvm_dev_mblob as const unsigned char __tvm_dev_mblob[46788038] attribute((section(".rodata"))); the declaration grammar is compiler specific, put to which section is compiler specific too. so I think the RFC need to consider the case.

FrozenGene · April 13, 2020, 2:50am

Thanks for respond.Finally, we don’t use this special hack. We will generate this directly using LLVM IR. And LLVM will put this into rodata section correctly.

Like this test:

windclarion · April 13, 2020, 4:11am

Good solution! Thanks FrozenGene! but if we use LLVM, llvm series target can take advantage of this solution, I’m not sure if other targets such as cuda can use this solution.

FrozenGene · April 13, 2020, 5:42am

CUDA also could use this. Because cuda’s target host is LLVM. As the example I show, it is in fact cuda target. So you could see NVIDIA NNVM Compiler in the constant string.

windclarion · April 13, 2020, 6:17am

I got it. Thanks FrozenGene.

ramana-arm · April 15, 2020, 1:00pm

This won’t work by default for the C backend where we don’t necessarily rely on the presence of llvm or are we saying that there needs to be an llvm solution for the backend just to produce this constant data object always, so we do need a general solution …

Ramana

FrozenGene · April 15, 2020, 1:10pm

When we don’t have LLVM, we will fallback to our original way (call compiler to generate)

ramana-arm · April 15, 2020, 1:20pm

So, the problem hasn’t been fixed : there is a “solution” depending on the presence of an llvm target.

Ramana

FrozenGene · April 15, 2020, 1:27pm

I think I should clarify your question. Do you mean we should generate .rodata section of unsighed char __tvm_data_blob[]?