Optimisation and Scheduling for inference in TVM

hrgraj · July 2, 2018, 7:23am

Hi I have successfully tested nnvm/tvm execuiton for DLmodel like vgg,resnet etc .now i want to reduce the the inference time for these models.
Am, slightly confused here !
what my understanding is

We can do optimization of graph and enable tvm optimization parameters(like loop unroll …etc) in tvm build config to generate optimized tvm runtime module.
Next level could be implementing schedules for computation. in cpu maybe using vectorization and in gpu could be implementing scheduling for cuda /opencl kernel .
Am i correct ?
For ex .Nvidia with OpenCL ,i couldnt find any schedules for these .How can to approach in implementing these schedules .

Any help is much appreciated .
Thanks

merrymercy · July 7, 2018, 5:21am

Your understanding is right.

TOPI: TVM Operator Inventory has implementations for various backends. It is optimized for some workloads and machines. It is a good reference for you. https://github.com/dmlc/tvm/tree/master/topi

hrgraj · July 11, 2018, 10:21am

@merrymercy Thanks .I referred through this repo . I found many schedule implementations for various operators and hardware devices .But am confused on how to use it .
In my case i have an ONNX model (ex.resnet) loaded using nnvm.frontend.from_onnx(MODEL) API . and build using nnvm.compiler.build (…) . How do i call/add scheduling here ?
Is there any tutorial on how to call the topi schedules into user program ,where the pretrained model is loaded via frontend ?

eqy · July 11, 2018, 6:01pm

Typically you do not have to explicitly call into TOPI to use the schedules—they should be automatically used depending on the operators and shapes in your model. See the build process in this script: https://github.com/dmlc/nnvm/blob/ef0ab9b09dbf1318851be311d3752de6c9bd4881/examples/benchmark/gpu_imagenet_bench.py#L56

hrgraj · July 12, 2018, 8:26am

@eqy @merrymercy
Yes i got some idea on where to implement the schedules and how to utilise while compiling .

I think for cuda target ,by default it will include the schedules . But incase of other targets like openCL , if we choose tvm.target.mali() [ opencl -device=mali ],it will include schedules define for Mali GPU[defined in topi/mali/… ]
Am i correct ? .
suppose If we choose target=‘opencl’ without any device name , which schedules are used ?

eqy · July 19, 2018, 9:31am

I believe if you do not specify any device name it will use the “rocm” schedules. You can verify this by just adding some print statements in the schedule functions.

hrgraj · July 27, 2018, 9:15am

@eqy I got reply from another thread that , cuda schedulers were used for default opencl target.

masahi · July 27, 2018, 10:51am

@hrgraj That is correct. CUDA schedules are used by default for OpenCL backend, unless a specialized schedule for a certain device (e.g. Mali) is available.

eqy · July 27, 2018, 6:13pm

Yes, I should have been clearer: the “rocm” schedules link to CUDA schedules.