@jonso, @tqchen should have answer your question. For IoT devices, I want to emphasis the key part is if we can not have these runtime, how do we do? Cross compile these runtimes is not good idea.The memory limit is even worse condition. My concern is we open one pandora box and get the sweet taste so that we don’t have strong willings to support operators persistently, which will hurt IoT devices area. We should continue to support relay + topi operators and get lightweight deploy experience. So when we design this RFC, we should have one section to express how do we handle it on IoT devices when it can not support other runtimes, it should also be the first priority. Wish you understand my concern.
I agree. The native compilation support for end to end pipeline should always be the first class goal of the project.
The additional libraries are flexible ways to interface with the existing frameworks, but won’t help us in cases that we might need the tvm stack the most such as IoT, new hardwares and accelerators. Even for server class applications, compilation would give quite some benefits, such as minimized runtime deployment(minimum containerization) protection etc.
@FrozenGene I completely agree, native compilation is always the goal. Personally, I’m hoping that supporting native runtimes will help us prototype TVM for new model architectures faster, and help prioritize dev work in terms of operator support.
@tqchen thank you very much for the detailed explanation! I did a deep-dive into the docs and the code, and realized that I had a fundamental misunderstanding of how the final graph is run. It seems that the graph JSON determines the order that functions are run in. Is that correct?
I’ll start working on a prototype and post here when I have something reasonable. We can discuss in more detail from there!
Btw, I am planning on starting from supporting graph runtime. Is there an ETA for Relay VM to be mainstreamed? If it’s soon, maybe I should focus on supporting that instead of the graph runtime.
@comaniac and I’ve been working the 3rd party integration and graph partitioning for a while, we have a POC implementation. I have just posted an RFC for what we have been doing.
@jonso Relay VM is able to execute all models that graph runtime support now, with ok performance(<10% cost due to lack of static memory planning, which is WIP by @jroesch and @haichen). I’d recommend try graph runtime first since it’s stable enough and won’t change a lot in the future. I expect there will be a few major iterations in VM implementation next.
I think the proposal about additional runtime support has nothing to do with the choice of relay vm or graph runtime because both VM and runtime will invoke the additional runtime as opaque PackedFuncs.
Oops thanks for correcting me. I definitely missed the context here. If we choose PackedFunction
to be the interface between various runtimes, graph runtime and vm should be able to share most of the infrastructure we setup.
One of the things to keep in mind is that this likely only works for “elementary ops”, but it is unclear what to do with ops like control-flow that have blocks / subgraphs, in particular when these blocks / subgraphs contain bits that TVM would like to optimize.
Also frameworks like PyTorch, being supportive of interoperability, have rather excellent support for extracting subgraphs of “known” ops, so dismissing that route because TF does not (which is my reading of the last paragraph, I don’t have first-hand knowledge of it) is probably a decision optimizing for handling TF graphs rather than the general case.
Personally, I’m probably not looking at TVM as a generic compiler for frontend frameworks but see most value in the fusion / optimization framework and would love to seamlessly get the benefits TVM can offer in the framework I’m already using (which happens to by PyTorch in my case). The notion of converting models from one framework to another might be something that is attractive/familiar to users of TF, but from hanging out in the PyTorch forums I get the feeling people are glad when they don’t have to convert models for deployment.
Best regards
Thomas
@jonso Is there any development/PR related to this RFC?
As @FrozenGene mentioned graph partitioning + external codegen can be used as a way to address unsupported operators. Maybe the recent BYOC feature + the ongoing work on [RFC] Op based annotation for external codegen is a solution to this challenged?
@tico this is definitely something that we should be able to support now, I just haven’t had time to work on it I’m hoping to be able to get back to it in a few weeks. I’ll keep you updated on my progress, or if you want to work on it feel free!
Would a reasonable first step towards this be to define a “CustomFunction” relay IR node with the paralleling much of tvm.relay.Function (but subclassing BaseFunc) when it comes to types in and out except that instead of body we have a PackedFunc reference of what is to be called.
One drawback I see compared to a “CustomOp” is that it affects visitors too much.
Best regards
Thomas
We also came into similar issues when compiling various models using TVM. Improving operator coverage is definitely an option. Finding a more general way to support more models is also worth exploring.
I’m working on this, but I’d still need feedback on whether to implement an own IR node or try to squeeze into using a call node somehow.