Optimisation and Scheduling for inference in TVM

Hi I have successfully tested nnvm/tvm execuiton for DLmodel like vgg,resnet etc .now i want to reduce the the inference time for these models.
Am, slightly confused here !
what my understanding is

  1. We can do optimization of graph and enable tvm optimization parameters(like loop unroll …etc) in tvm build config to generate optimized tvm runtime module.

  2. Next level could be implementing schedules for computation. in cpu maybe using vectorization and in gpu could be implementing scheduling for cuda /opencl kernel .
    Am i correct ?
    For ex .Nvidia with OpenCL ,i couldnt find any schedules for these .How can to approach in implementing these schedules .

Any help is much appreciated .
Thanks

Your understanding is right.

TOPI: TVM Operator Inventory has implementations for various backends. It is optimized for some workloads and machines. It is a good reference for you. https://github.com/dmlc/tvm/tree/master/topi

@merrymercy Thanks .I referred through this repo . I found many schedule implementations for various operators and hardware devices .But am confused on how to use it .
In my case i have an ONNX model (ex.resnet) loaded using nnvm.frontend.from_onnx(MODEL) API . and build using nnvm.compiler.build (…) . How do i call/add scheduling here ?
Is there any tutorial on how to call the topi schedules into user program ,where the pretrained model is loaded via frontend ?

Typically you do not have to explicitly call into TOPI to use the schedules—they should be automatically used depending on the operators and shapes in your model. See the build process in this script: https://github.com/dmlc/nnvm/blob/ef0ab9b09dbf1318851be311d3752de6c9bd4881/examples/benchmark/gpu_imagenet_bench.py#L56

@eqy @merrymercy
Yes i got some idea on where to implement the schedules and how to utilise while compiling .

I think for cuda target ,by default it will include the schedules . But incase of other targets like openCL , if we choose tvm.target.mali() [ opencl -device=mali ],it will include schedules define for Mali GPU[defined in topi/mali/… ]
Am i correct ? .
suppose If we choose target=‘opencl’ without any device name , which schedules are used ?

I believe if you do not specify any device name it will use the “rocm” schedules. You can verify this by just adding some print statements in the schedule functions.

@eqy I got reply from another thread that , cuda schedulers were used for default opencl target.

@hrgraj That is correct. CUDA schedules are used by default for OpenCL backend, unless a specialized schedule for a certain device (e.g. Mali) is available.

Yes, I should have been clearer: the “rocm” schedules link to CUDA schedules.