How to reduce the warm-up time on RK3399 using opencl?

daming5432 · April 23, 2018, 3:57am

I run the c++ example on RK3399 GPU. I find that the warm-up time is 27s, the run time is 190ms.how to reduce the warm-up time?

yidawang · April 24, 2018, 3:42am

If you use time_evaluator to measure the time, the first run (i.e. warming up) will not be counted. BTW, I am surprised that the warm-up time in your case is so long.

daming5432 · April 24, 2018, 7:37am

yes, I use the time_evaluator to measure the time, the run time is 190ms.But if I use the gettimeofday() to measure the “run()” time, the time is 27s.

daming5432 · April 25, 2018, 3:17am

I found that the main focus of warm up time is on this function:

PackedFunc WrapPackedFunc(BackendPackedCFunc faddr,
const std::shared_ptr& sptr_to_self) {
return PackedFunc([faddr, sptr_to_self](TVMArgs args, TVMRetValue* rv) {
int ret = (faddr)(
const_cast<TVMValue>(args.values),
const_cast<int*>(args.type_codes),
args.num_args);
CHECK_EQ(ret, 0) << TVMGetLastError();
});
}

tqchen · April 25, 2018, 5:33am

The first time overhead is the overhead of compiling the opencl kernel on board. So short answer is that unfortunately we could not reduce that because this is the cost payed in the opencl driver. Note that the warmup is an one time cost that we will need to pay and subsequent prediction will run just fine.

Depending on the possible availability of opencl binary compiler, one possible way is to pre-build and pack device dependent binary, which is not a very portable solution though

daming5432 · April 25, 2018, 7:04am

Do you mean that the cost of preheating is mainly due to this function: clBuildProgram(program, 1, devices, options, NULL, NULL)？