I run the c++ example on RK3399 GPU. I find that the warm-up time is 27s, the run time is 190ms.how to reduce the warm-up time?
If you use time_evaluator
to measure the time, the first run (i.e. warming up) will not be counted. BTW, I am surprised that the warm-up time in your case is so long.
yes, I use the time_evaluator to measure the time, the run time is 190ms.But if I use the gettimeofday() to measure the “run()” time, the time is 27s.
I found that the main focus of warm up time is on this function:
PackedFunc WrapPackedFunc(BackendPackedCFunc faddr,
const std::shared_ptr& sptr_to_self) {
return PackedFunc([faddr, sptr_to_self](TVMArgs args, TVMRetValue* rv) {
int ret = (faddr)(
const_cast<TVMValue>(args.values),
const_cast<int*>(args.type_codes),
args.num_args);
CHECK_EQ(ret, 0) << TVMGetLastError();
});
}
The first time overhead is the overhead of compiling the opencl kernel on board. So short answer is that unfortunately we could not reduce that because this is the cost payed in the opencl driver. Note that the warmup is an one time cost that we will need to pay and subsequent prediction will run just fine.
Depending on the possible availability of opencl binary compiler, one possible way is to pre-build and pack device dependent binary, which is not a very portable solution though
Do you mean that the cost of preheating is mainly due to this function: clBuildProgram(program, 1, devices, options, NULL, NULL)?