I’ve been following the progress on introducing precomputed winograd weight transforms and wondered if there is a mechanism to precompute in a more general way. For example, I need to transpose the weights when scheduling a dense operator and this takes about half of the execution time. Currently I need to define a new operator in NNVM along the lines of conv2d_dense_transpose_weights and then use alter_op_layout to split the existing dense operator. But this starts to become unmanageable when you have one-off weight transforms or weight padding steps each requiring a new operator to allow them to be precomputed.
Is there a different pattern I should be following to achieve precomputation, or is there a plan to make this more flexible in the future?
do you mean transpose is not happening during precompute? I think it should.
I don’t understand what conv2d_dense_transpose_weights op is supposed to do, and what you mean by ‘split the existing dense operator’. Is conv2d fused with dense op?
I have a custom schedule for dense on Mali and it performs significantly better with a transpose. This is not present in the existing schedule.
Currently winograd filter transform precomputation happens by splitting the conv2d into a winograd_weight_transform and winograd_without_weight_transform operator (eg. the arm_cpu schedules) allowing the former to be precomputed when the graph is compiled. I’m wondering whether the only way for me to achieve the same effect is by creating a dense_weights_transform and a dense_without_weights_transform, or is there some other way I can make steps of the computation precomputed?
I think if you use nnvm.symbol.transpose on your dense weight, it will be precomputed. But then you cannot use nnvm.symbol.dense anymore, so you need a special dense symbol to fit you need.
I suggest doing something similar to what conv2d_alter_layout does to conv2d symbol. You can define something like dense_alter_layout using @reg.register_alter_op_layout(“dense”) in nnvm/python/nnvm/top/nn.py.
Then you can intercept a dense symbol during compilation and turn it into nnvm.symbol.transpose + nnvm.symbol.your_dense.
Oh, I’ve just noticed that what I said above is basically the same as what you said in your first post. I agree that current system is not flexible regarding precompute. For winograd filter precompute, we can do either F(4x4, 3x3) or F(2x2, 3x3), depending on the parameter tile_size (see here). For you case, You can bundle related transforms into a single operator, and choose which one to use depending on passed parameters.
Thanks for this. I’m using the alter_op_layout method for now and it does work reasonably well. It would be great to be able handle all precomputation behaviour from within the schedules without needing to manually register custom NNVM operators, but I realise this is difficult to achieve.