[DISCUSS][torch.fx] Support pytorch's new frontend torch.fx

How does that work? Does it effectively “inline” the custom op?

Yes, most OPs defined in ATen primitive can correctly traced. I attach an example below

x = torch.randn(10, 15)
w = torch.randn(20, 15)

class SomeOPs(autograd.Function):
    @staticmethod
    def forward(ctx, x, w):
        # More calculations can be added here
        w = (w + 10) * 2
        return x.mm(w.t())

def fn2(x, w):
    return SomeOPs.apply(x, w)

ts = torch.jit.trace(fn2, (x, w))
symbolic_traced : torch.fx.GraphModule = symbolic_trace(fn2)
print(symbolic_traced.graph)

'''
graph():
    %x : [#users=1] = placeholder[target=x]
    %w : [#users=1] = placeholder[target=w]
    %add : [#users=1] = call_function[target=operator.add](args = (%w, 10), kwargs = {})
    %mul : [#users=1] = call_function[target=operator.mul](args = (%add, 2), kwargs = {})
    %t : [#users=1] = call_method[target=t](args = (%mul,), kwargs = {})
    %mm : [#users=1] = call_method[target=mm](args = (%x, %t), kwargs = {})
    return mm
'''

In torchscript, such customized OP is simply naming as prim::PythonOp and the impl details are lossing. But with torch.fx, as long as the OP implementation are in python-level, the DAG can be properlly traced.

If it is possible to share the op conversion table with Torchscript, we can certainly add fx support. Otherwise, we need to develop the PT frontend 2.0 from scratch, I don’t think that’s worth it.

fx records primitive ATen operations and surely it can share op mapping table as torchscript. I don’t think we need to develop & maintian a separate table for fx backend.

I experimented with FX a bit in https://github.com/apache/tvm/pull/10091 , my impression was that its “symbolic tracing” is currently very limited, so it only works on simple and clean models.

One known limitation of symbolic tracing is about the if-else statement. But for “complex models”, I find that torch.fx statisfies most models I recently play with (e.g., ViT based models, quantized transformer). I agree there might be some corner issues we need to resolve, but for most cases (i.e., model shipped with torchvision / huggingface), this should work as expected.

PS: sry for updating late, was busy with conference experiements.