Unsupported ops in StyleGAN2

Wheest · December 15, 2021, 4:38pm

I’m trying to import a pre-trained PyTorch StyleGAN2 model, however am having some issues.

I get the error:

NotImplementedError: The following operators are not implemented:
[
        "profiler::_record_function_exit",
        "profiler::_record_function_enter",
        "prim::PythonOp",
        "aten::square",
        "aten::randn",
]

For the aten::randn, in this thread someone implemented a custom op which keeps the value static:

def randn(inputs, input_types):
    return relay.expr.const(
        torch.randn(
            size=tuple(
                int(i.data.asnumpy()) if isinstance(i, relay.Constant) else int(i)
                for i in inputs[0]
            )
        ).numpy()
    )

I’ve also made a none function which I think should drop the profiling functions:

def none(inputs, input_types):
    return None

I imagine I could maybe make an equivalent for aten::square, but I wonder if there’s a recommended way of doing it?

compilation should look something like this I reckon:

mod, params = relay.frontend.from_pytorch(
        scripted_model,
        shape_list,
        {
            "aten::randn": randn,
            "profiler::_record_function_enter": none,
            "profiler::_record_function_exit": none,
            #"prim::PythonOp": ????,
            "aten::square": square,
        },
    )

I’m aware that prim::PythonOp is not supported, and am trying to identify what the ops are and if they can be removed.

I’ve got code and instructions available here.

masahi · December 15, 2021, 8:34pm

Did you do .eval() on your model? Without it dropout will be there in the inference graph. I don’t assume this model inherently depends on the randomness during inference.

profiler::_ things are weird… How exactly are you tracing your model? Maybe you are trying to run Torchscript’s optimization passes? You shouldn’t be.

Wheest · December 16, 2021, 12:56pm

I’m unsure if the model relies on the randomness during execution, though I’m keen to find out.

I did use .eval() on the PyTorch model, my steps in the gist were to:

Clone the stylegan2-ada-pytorch implementation https://github.com/NVlabs/stylegan2-ada-pytorch
Download a model file
Trace the model:

    model.eval()
    outs = model(*test_input_datas)
    scripted_model = torch.jit.trace(model, test_input_datas).eval()

The issue happens even without the eval() after the torchscript tracing.

Attempt to load into Relay with relay.frontend.from_pytorch

It looks like they implement some custom ops, which could partially be the source of the issue.

Running the model for the first time in a container gives the following message:

Setting up PyTorch plugin "bias_act_plugin"... Done.
Setting up PyTorch plugin "upfirdn2d_plugin"... Done.

I also can’t export the model via ONNX.

hans · December 16, 2021, 3:33pm

You can find native pytorch versions of the custom ops here: https://github.com/rosinality/stylegan2-pytorch/tree/master/op (in the .py files)

I was eventually able to get a version that worked in relay, although I ran into some errors on running the autotuner. You can find my script here: https://github.com/JCBrouwer/stylegan2-ada-pytorch/blob/efficiency/efficiency/tvm_autotune.py

I ended up manually removing the profiling ops and replacing other ops which weren’t translating correctly in the actual pytorch module definition.

Wheest · December 16, 2021, 5:02pm

Many thanks, you’ve got me going in the right direction!

I cloned your code at the efficiency branch, and tried exporting a new Generator model. Removing the profiler decorators (and profiling explicitly in the modules) helped. The only operation remaining now is prim::PythonOp.

Do you recall how you dealt with that?

Wheest · December 21, 2021, 12:03pm

In searching for the source of the source prim::PythonOp, I believe that at least one of the sources is the modulated_conv2d layer.

My methodology to identify the issue has been generating one layer networks of StyleGAN2-ADA and trying to export them to TVM.

ToRGBLayer and SynthesisNetwork layers both contain prim::PythonOp, and they have modulated_conv2d in common. FullyConnectedLayer exports with no issue.

There’s a lot going on in this layer, so I’m a bit unsure of how to tackle it.

EDIT I was just trying to export the traced PyTorch model. It gave the error:

RuntimeError: 
Could not export Python function call '_FusedMultiplyAdd'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:

This could be useful information on the source of the op.

Wheest · December 21, 2021, 5:26pm

I’ve attempted to merge the “pure PyTorch ops” from this repo shared by @hans earlier.

You can see my ongoing efforts on this branch, which also removes the profiling operations. hans’ fork also removed the profiling ops, but the versions I’m seeing still have the unsupported ops.

I’m still without a working version, as I am having an issue at the PyTorch level with getting the right intermediate style_dim for one of the layers integrating ModulatedConv2d.

Additionally, I have tried importing the vanilla stylegan2-pytorch repo. However I am getting however even that fails with unrecognised ops: ['aten::normal_', 'aten::new_empty', 'prim::PythonOp'].

Wheest · January 5, 2022, 1:55pm

Okay, I have a version of stylegan2-ada-pytorch that can be imported into TVM.

I include a script that can be used to convert existing models to to TVM. It can be found on the my fork of the stylegan2-ada repo.

I found that I can only export to TVM in a number of very specific circumstances, namely if I import via ONNX, using the freeze_params=True option in the Relay frontend.

I’ll include some notes on what went wrong, so in future people can search for their error messages.

ONNX importer

I only way I have been able to import the model to TVM is by exporting the model to ONNX and then importing from there.

However, I had an issue I had to figure out with this. I found that I got the following error with relay.frontend.from_onnx().

ValueError: Cannot use and / or / not operator to Expr, hint: use tvm.tir.all / tvm.tir.any instead

Within the ONNX frontend, the function which triggers the error is expand_shape() in the Expand converter.

Namely, the result of in_dims = infer_shape(in_shape)[0] is 3, whereas the result of new_dims = infer_shape(shape)[0] is ?, at least that’s what I get from printing them.

I found that setting the variable freeze_params=True in relay.frontend.from_onnx avoids the problem.

This post about dyn.strided_slice, appears to be related, perhaps this is a stop-gap solution to this problem generally?

Using the PyTorch importer

As discussed earlier in this thread, I was not able to get the PyTorch importer working.

Using the pure PyTorch implementation, we no longer have unsupported prim::PythonOps, however we still have ops for ['aten::square', 'aten::randn'].

I have initial implementation of each in my convert script, but square does not yet work, and I got a workable solution with ONNX.

masahi · January 5, 2022, 9:59pm

What is difficult for aten::square? What not just x * x?

Have you verified that this model indeed needs runtime randomness? I think randn using the converter below is not correct (it is a constant). You should use op.random.normal like our ONNX frontend (a PR welcome). But then I don’t know how you verify that the output is “correct” if it uses random.


def randn(inputs, input_types):
    return relay.expr.const(
        torch.randn(size=tuple(
            int(i.data.asnumpy()) if isinstance(i, relay.Constant) else int(i)
            for i in inputs[0])).numpy())

masahi · January 6, 2022, 12:33am

It’s probably better to use use_noise = False here, this will remove the need for runtime randomness.

github.com

NVlabs/stylegan2-ada-pytorch/blob/main/training/networks.py#L293-L294


if self.use_noise and noise_mode == 'random':
    noise = torch.randn([x.shape[0], 1, self.resolution, self.resolution], device=x.device) * self.noise_strength

Wheest · January 6, 2022, 11:03am

There is little inherently difficult about aten::square, it was more a matter of my getting ONNX working first, and being keen to continue with the project I was working on using this network.

Visually the output appears correct when imported via the ONNX frontend, given the same input as the PyTorch model; this implies that runtime randomness is not needed, or at least has little impact on the model I was looking at. You are correct that setting use_noise = False will side-step the issue entirely.