Graph optimization not exist in GPU?

This might be a pretty naive question, but I was wondering why the graph optimization (tune_graph with graph_tuner) exists in the x86 tutorial but is missing in the CUDA example.

Any help is appreciated!

You’re right. Graph tuner only supports CPU.

Thanks for the quick response! Does GPU not need graph-level optimization or TVM is relying on CuDNN for this part?

No. TVM also generates CUDA code. It’s because only X86 has NCHW[c]c layout that needs graph tuning to optimize layout transform overhead between different NCHW[x]c layouts. In GPU, all conv2d are in NCHW layout, so we don’t have layout transform overhead between ops, so we don’t need graph tuner.

Thanks for the explanation! Without the graph tuner, how does TVM do the graph-level optimization, such as operator fusion? Please bear with my naive questions :sweat_smile:

Graph tuner is different from graph optimization. Graph optimization such as operator fusion and constant folding are mostly target independent and are applied during the compilation. In other words, as long as you compile a model using TVM, graph optimizations are involved already.

Ah I see. Could you provide me some quick pointers to the code where graph optimizations are implemented and used? I really appreciate it!

You can first go through this tutorial to get familiar with the code base: https://tvm.apache.org/docs/dev/codebase_walkthrough.html

Then you can trace Relay/TIR passes for such optimizations.

Thanks! One last quick question: is it easy to locate which line of code in the tutorial applies the graph optimization?

with tvm.transform.PassContext(opt_level=3):
    lib =relay.build_module.build(mod, target=target, params=params)

The Relay build process eventually calls the graph optimization pipeline.

Thanks for the pointer!