No. TVM also generates CUDA code. It’s because only X86 has NCHW[c]c layout that needs graph tuning to optimize layout transform overhead between different NCHW[x]c layouts. In GPU, all conv2d are in NCHW layout, so we don’t have layout transform overhead between ops, so we don’t need graph tuner.