PyTorch/LibTorch/TorchScript as a backend

I have been working on support for calling TorchScript from TVM as a backend. This can be used fallback for when torch operators are not yet implemented or if one wants to incorporate bespoke PyTorch custom ops into TVM with ease.

My proposed implementation strategy is

  • add a new relay torchop that takes a variable number of inputs and executes a provided TorchScript (aka PyTorch JIT) function,
  • add a backend module class that calls into LibTorch (aka the C++ bindings underlying PyTorch), executing a TorchScript function.

The key addition to the relay infrastructure is that while leaving num_inputs == -1 on operator registration is documented to indicate a variable number of inputs, the type inference pass is not prepared to deal with it and instead requires the number of arguments provided to match the number of arguments with add_argument on operator registration. The proposed change to the type inference is then to match the declared arguments but to allow additional arguments if num_inputs is -1.

The other detail is that it uses a string attribute in the call node’s attributes to take the serialized TorchScript. This is a bit fishy as the serialized representation is binary. I used serialization to get TorchScript into TVM at the C++ level as it is tricky to interoperate between PyBind-wrapped objects (TorchScript in PyTorch) and the TVM FFI, but we might pass things around as handles later (but I’m not sure if that works well with attributes). I would be glad to have your advice on this.

Currently I only support one output tensor and FP32, but that is straightforward to make flexible, and I would do this in parallel to the more fundamental discussion e.g. around the things above.

Even though this is in draft state, I have opened PR 7401 to make discussions concrete in terms of code.

Thank you!

1 Like

cc @masahi @jroesch @junrushao

AWS is also doing “partial compilation” with similar motivation, see Speeding up TensorFlow, MXNet, and PyTorch inference with Amazon SageMaker Neo | AWS Machine Learning Blog and http://www.youtube.com/watch?v=Tb-w6IVPRds&t=23m24s

One thing I wonder is, could this help integrating TVM into pytorch, for training? Some people might be interested in this.

The alternative for TVM integration into PyTorch might be to take the opposite approach and have the PyTorch JIT call TVM. This had been prototyped by facebook as torch-tvm, but I think between the PyTorch frontend we have now and with the enhanced TorchScript dev experience things I’m working on (enabling modification of TorchScript from Python), it would be much easier to implement using the dictionary of TVM-supported ops and sending these blobs into the PyTorch frontend.

So one thing I’m still thinking about is whether TorchScript functions is the right level or if I should try harder to only do individual ops.

Pro TorchScript:

  • Individual ops would go much deeper into the PyTorch JIT API than the very widely used TorchScript interface we use now.
  • We don’t have to figure out what to do with non-tensor arguments as they’ll be created as constants in TorchScript
  • We faithfully use the PyTorch chain, e.g. their optimizations for the graphs we feed.
  • There might be applications where having programs is actually better.

Pro Ops:

  • The match to Relay in granularity is much better and wrapping each op in a TorchScript function is quite an overhead.

What do you think?

Thanks for the proposal. I care more about the overhead part. We can imagine that making TorchScript for every indifidual op would generate lots of TorchScripts, which may introduce non-negligible overheads. Is it possible to use a real model to evaluate the relationship between Torchscript number (i.e., offloaded ops) and overheads, so that people can easily judge whether it is worthwhile to pay efforts? After all, it would be meaningless if the end-to-end performance is worse than PyTorch due to this overhead.

I want to see big subgraphs partitioned automatically by BYOC, sent to libtorch and hopefully fused there. I think this is also what BYOC is originally designed for (as big subgraph as possible). So I’m +1 for Pro TorchScript.

Why is this a pro for op-by-op? It sounds like more of a con to me (since we need to wrap more ops?).

If the goal is to a be fallback mechanism I think it makes sense to take full sub-programs vs. single operations. One potentially better design for the ops could be lifting the offloaded into a special top-level function which contains the torch-script vs. needing to embed in call nodes directly. This might allow you to use the default call node with some attributes vs. needing to match another piece of AST in the compiler.

Thank you for your comments!

So there are two parts:

  • What do we represent in Relay,
  • what will be the function in the runtime.

I guess I’m reading agreement here that TorchScript functions are a reasonably good fit for the runtime.

If we determine this function during the (currently trivial) byoc phase, this would mean that we want to change the representation in relay to be at the op level (I guess). For this I need to do more figuring out how to best represent an arbitrary op in PyTorch, probably symbol (=name) and signature plus non-tensor inputs in attributes.

Does the variable number of args handling in type inference look reasonable to you?