Best way to deal with kernel layout?

JosseVanDelm · March 31, 2021, 9:22am

@jinchenglee thanks for asking this question, I’m facing a similar issue

@comaniac I was not aware of this data layout transformation process. It would be useful for me as well, as I’m targetting a microcontroller with embedded accelerator, which has a very specific data layout. I’m not using BYOC though (our approach is covered in this recent post). I thought I would have to make separate operator implementations with data layout preparation steps for each operator from the Relay Strategy. Like here (ARM seems to have different TOPI operators for different data layouts, but this seems opposite to your answer above):

github.com

apache/tvm/blob/a1b4f0e8f2bfcc583f98f0f9272adcc0c12f70a5/python/tvm/relay/op/strategy/arm_cpu.py#L52


@schedule_concatenate.register(["arm_cpu", "micro_dev"])
def schedule_concatenate_arm_cpu(_, outs, target):
    """schedule concatenate for arm cpu"""
    with target:
        return topi.arm_cpu.schedule_concatenate(outs)
@conv2d_strategy.register(["arm_cpu", "micro_dev"])
def conv2d_strategy_arm_cpu(attrs, inputs, out_type, target):
    """conv2d arm cpu strategy"""
    strategy = _op.OpStrategy()
    data, kernel = inputs
    dilation_h, dilation_w = attrs.get_int_tuple("dilation")
    stride_h, stride_w = attrs.get_int_tuple("strides")
    padding = attrs.get_int_tuple("padding")
    groups = attrs.groups
    layout = attrs.data_layout
    kernel_layout = attrs.kernel_layout
    if dilation_h < 1 or dilation_w < 1:

I find it a bit confusing that it is apparently possible to account for different data layouts in different parts of the stack.

Also I find it a bit weird that this data layout pass would be implemented as a relay pass, as I thought relay passes are supposed to be hardware independent. But actually I don’t think such a data transformation pass would be useful for other devices that don’t expect the same data layout as the accelerator, so that would need to be hardware dependent then, right?

How do you think I should proceed? Is your answer different if we don’t use BYOC? And is there some documentation on data layout transformations along the TVM stack perhaps, besides the example you showed us? Thanks!