How dose TVM elimitate calls of conv weights layout_transform?

Jason · October 20, 2020, 3:45am

Thanks for the reply Kevin! Those two layout trans make sense, but for filter parameters, they’re loaded from .pth with OIHW by default(relay/frontend/pytorch.py) and I set desired_layout for HWIO. Will these filter parameters be transformed in advanced or by a cuda kernel in each inference?

I guess they should be converted only once since these parameters are kind of constant data regarding the inference process. Could someone give me a hint which parts of code responsible for it? I observed the same number of layout_transform calls with conv calls in my model/running, therefore something is wrong with it. In comparison, the gpu trace of tvm resnet sample shows only two layout transform, which is expected.

I’m a very beginner to TVM code base and where should I start? Thanks a lot.