Quantizatition and pruned model

reku · June 24, 2021, 11:37am

You can add your own legalize strategy, refer to this PR:

But I’m not sure if you will really get speed up after padding. So you can also consider directly modifying the cuda strategy of your own conv2d_int8 so that it can be distributed to topi that does not require dp4a.

github.com/apache/tvm

python/tvm/relay/op/strategy/cuda.py

5fa1c6dae


      
          
          
          @schedule_lrn.register(["cuda", "gpu"])
          def schedule_lrn_cuda(attrs, outs, target):
              """schedule LRN for cuda"""
              with target:
                  return topi.cuda.schedule_lrn(outs)
          
          
          @conv2d_strategy.register(["cuda", "gpu"])
          def conv2d_strategy_cuda(attrs, inputs, out_type, target):
              """conv2d cuda strategy"""
              strategy = _op.OpStrategy()
              data, kernel = inputs
              stride_h, stride_w = attrs.get_int_tuple("strides")
              dilation_h, dilation_w = attrs.get_int_tuple("dilation")
              padding = attrs.get_int_tuple("padding")
              groups = attrs.groups
              layout = attrs.data_layout
              kernel_layout = attrs.kernel_layout
              if dilation_h < 1 or dilation_w < 1: