Thanks for the quick response! Does GPU not need graph-level optimization or TVM is relying on CuDNN for this part?