Running a CNN using sparsity convertor

Wheest · July 10, 2020, 9:21am

With the recent tutorial added by @jwfromm around a Transformer using sparsity, I was curious about how to get this working for a CNN.

I adapted the example to use a small CIFAR-10 classifier, the ONNX model used can be downloaded here. The adapted notebook can be downloaded from this Gist.

However, I fail when converting the graph with simplify_fc_transpose with the error:

an internal invariant was violated while typechecking your program [10:04:26] ../src/relay/op/nn/pad.cc:126: Check failed: data->shape.size() == param->pad_width.size(): There should be as many pad width pairs as shape dimensions but the shape has 4 dimensions and there are 5 pad width pairs.

I guess that because the CNN is using packing for convolution, converting FC operations to sparse ones might not work? In that case I would need to implement a GEMM Convolution approach, and make sure it was tagged correctly for the sparse operation replacement?

Or does it seem that this error is connected to something else? Especially since the error is to do with the padding layer getting the wrong number of tuples.

jwfromm · July 14, 2020, 6:52pm

Hi @Wheest, I dont think the error you’re encountering has anything to do with sparsity. It’s specifically complaining that a convolution is being constructed with an invalid padding attribute. Not sure why that’s happening though. You dont need to use the simplify_fc_transpose pass for a convolutional network though so maybe this issue can be avoided.

Wheest · July 15, 2020, 5:09pm

Thanks for looking at this.

It works with the non-sparse version, it’s when I try to replace operations in the graph with sparse ones using simplify_fc_transpose that the invalid padding appears.

My goal here is to run a sparse CNN, so I’m trying to scope out what changes I need to make for it to work. It seems that this approach makes the graph invalid, so some custom work will be needed to get sparse CNNs in TVM. Haven’t seen any other work in the community about this beyond the lower level support for sparsity, and your Transformer tutorial.

jwfromm · July 15, 2020, 9:36pm

To be clear simplify_fc_transpose is a pass that removes transpose operations between weights and nn.dense calls. This makes it easier to convert dense ops into sparse ones. Note that because relay only has nn.dense rather than nn.matmul, we have insert transpose ops all over the place. However, this isn’t true for convolutions. I dont see any reason youll need to apply the pass on a CNN.

Wheest · July 17, 2020, 12:27pm

Thanks, I’m still trying to map out how sparsity in TVM works, it makes sense that simplify_fc_transpose is not needed.

I’ve updated my example (it is now self contained, and closer to being TVM tutorial quality). However when generating fake sparsity it’s throwing the error:

TVMError: Check failed: ObjectTypeChecker<TObjectRef>: :Check(ptr): Expect RelayExpr but get IRModule

This doesn’t happen in the original version, because ddo.simplify_fc_transpose.convert translates mod from a tvm.ir.module.IRModule to tvm.relay.function.Function using ddo.utils._run_opt_pass with SimplifyFCTranspose.

In my version, I’ve tried using a basic opt pass with mod = ddo.utils._run_opt_pass(mod, relay.transform.SimplifyExpr()). However this throws Expect RelayExpr but get IRModule.

So I think my question now is how do I translate an IRModule to Function? Or at least, how to generate fake sparsity without triggering this error.

This could be related to this issue, patched by @vinx13, where the module API is updated. But it could be something else.

Msabih · August 3, 2020, 9:24pm

Hi, did you manage to run sparse CNN example ?

Wheest · August 6, 2020, 1:09pm

My issue with the example is with generating fake sparsity to test with in the random_sparse_params(). I haven’t had the chance to investigate or fix this further.

However, the generation of fake sparsity can be disabled by passing gen_weights=False to run_sparse().

The default model has little sparsity I think, so the dense and sparse times are identical. I’m still to test what the difference in inference time is if I have an actually sparse model.

Wheest · August 16, 2020, 3:33pm

I’ve updated the self-contained example here, and we can observe that the inference time for the dense CNN and sparse CNN (with sparsity set to 95%) is identical, both 24ms on a test CPU device.

I wonder if the issue is that BSR is unsuited for CNNs, or if it’s that the key convolutional operations of the CNN are not being run with a sparse version. Does anyone have an idea as to what might be a fruitful area to investigate?

anijain2305 · August 17, 2020, 8:02pm

Hi @Wheest

CNNs are primarily made up of conv2d layers. AFAIK, the sparse kernels works only with dense ops. So, I am not sure if you are using the sparse kernels throughout the network.

Wheest · August 17, 2020, 8:57pm

Hi @anijain2305, thanks, as I feared. I guess it was too much to hope for that the sparsity component would magically make packed convolution sparse too.

I guess I could try and implement a prototype GEMM convolution that leverages the sparse GEMM. Do you have any resources in mind that could help, e.g. anyone else who has implemented im2col in Relay, or the relevant files for understanding how the sparse kernels are implemented?

anijain2305 · August 17, 2020, 9:34pm

It might be worth trying to directly implement the conv2d with sparse weights, instead of converting conv2d to im2col + dense.