I was tuning conv operation for Resnet-50 from onnx model zoo using OpenCL and copied Cuda scheduler with example tune_relay_cuda_example.py for 2000 iterations using 4 different Tuners on the same machine (nothing else was running at that time).
I noticed that best performance numbers differs on different Tuners but progress in most tasks ends before this number. For RandomTuner numbers are the lowest and for XGBTuner are usually (but not for all tasks) the highest. For GaTuner and GridSearchTuner they are always higher than for RandomTuner but in most cases lower for XGBTuner.
I thought that a tuner is used to choose a batch of configurations from a config space based on a cost function or using genetic algorithm, so if the number of iterations is considerably high, then, all of configurations will run, so best performance will be the same no matter what Tuner will you choose.
Could you explain me, then, how do tuners work? Is this difference in performance done because of higher number of timeouts and errors in particular runs or does choosing a tuner affect the measurements of the run?