reku
June 24, 2021, 11:37am
2
You can add your own legalize strategy, refer to this PR:
main
← wyc-ruiker:fix-leg
opened 12:17PM - 09 Jun 21 UTC
Pading conv2d NWHC/HWNC ops to legal shapes for using tensorcore on cuda target … and add some tests.
could you help review this pr? @jcf94 @Meteorix @jwfromm
But I’m not sure if you will really get speed up after padding. So you can also consider directly modifying the cuda strategy of your own conv2d_int8 so that it can be distributed to topi that does not require dp4a.
@schedule_lrn.register(["cuda", "gpu"])
def schedule_lrn_cuda(attrs, outs, target):
"""schedule LRN for cuda"""
with target:
return topi.cuda.schedule_lrn(outs)
@conv2d_strategy.register(["cuda", "gpu"])
def conv2d_strategy_cuda(attrs, inputs, out_type, target):
"""conv2d cuda strategy"""
strategy = _op.OpStrategy()
data, kernel = inputs
stride_h, stride_w = attrs.get_int_tuple("strides")
dilation_h, dilation_w = attrs.get_int_tuple("dilation")
padding = attrs.get_int_tuple("padding")
groups = attrs.groups
layout = attrs.data_layout
kernel_layout = attrs.kernel_layout
if dilation_h < 1 or dilation_w < 1: