When I read our post and try to summary, I find that we have this not to be decided. As previous post said, we don’t like GraphRuntimeFactoryModule
. However, we need to decide the default value of packing parameters
.
Hmm, would love to get people’s thoughts about param packaging. I think it is OK it leave it unpackaged(since it is the previous behavior) then we decide later if we want to turn it on by default
Let us keep it one day to hear thoughts of whether we should packing the weights by default. @haichen @jwfromm
BTW, if we don’t package weights, when we deploy , we will need two things : deploy.so / deploy.params. However, our API will encourage users : factorymod = relay.build(...)
and when we call __iter__
we will warn, seems it breaks our constraint here.
You are right, let us package by default then. In most cases seems package params is the right approach
I agree package params by default is good since it’s likely that TVM transforms the params during the optimization
As a side note, for large models like VGG19, it might require a long time to compile the params into shared library.
This problem should be fixed by my this pr: https://github.com/apache/incubator-tvm/pull/4657
Sorry for the delay because of other things. I will file a formal RFC on GitHub this weekend (or early next week) based on our discussion.
GitHub RFC: https://github.com/apache/incubator-tvm/issues/5038
Wish I don’t miss some part of so significant discussions we have completed.
const unsigned char __tvm_dev_mblob[46788038] = {“TVM_BLOB_SIG”}; maybe not enough. because 46788038 is too big for many embedded system, so I have to place __tvm_dev_mblob to special section, for example, a rodata section. so I mean I need declare __tvm_dev_mblob as const unsigned char __tvm_dev_mblob[46788038] attribute((section(".rodata"))); the declaration grammar is compiler specific, put to which section is compiler specific too. so I think the RFC need to consider the case.
Thanks for respond.Finally, we don’t use this special hack. We will generate this directly using LLVM IR. And LLVM will put this into rodata
section correctly.
Like this test:
Good solution! Thanks FrozenGene! but if we use LLVM, llvm series target can take advantage of this solution, I’m not sure if other targets such as cuda can use this solution.
CUDA also could use this. Because cuda’s target host is LLVM. As the example I show, it is in fact cuda target. So you could see NVIDIA NNVM Compiler
in the constant string.
I got it. Thanks FrozenGene.
This won’t work by default for the C backend where we don’t necessarily rely on the presence of llvm or are we saying that there needs to be an llvm solution for the backend just to produce this constant data object always, so we do need a general solution …
Ramana
When we don’t have LLVM, we will fallback to our original way (call compiler to generate)
So, the problem hasn’t been fixed : there is a “solution” depending on the presence of an llvm target.
Ramana
I think I should clarify your question. Do you mean we should generate .rodata section of unsighed char __tvm_data_blob[]
?
I wasn’t proposing that as a solution, that is one of the options. I’m merely stating that this is still a problem that will hit others most notably anyone using the C backend .
Ramana