[AutoScheduler] Choice of Mean/Median for Inference Cost

I just wanted to understand the reasoning for one small difference between the operator tuning and network tuning tutorials.

In the operator tuning tutorials we report the median of the evaluations but for network tuning we report the mean of the evaluations.

Is there a reason for this choice or was it arbitrary? Do evaluations in network tuning more closely follow a normal distribution and are less likely to have outliers and is that the reason for using mean there?

Previously in AutoTVM, we always use mean, but later we observed that this could be inaccurate due to outliers, so we were instead using median when writing the operator tuning tutorials.

On the other hand, the measurement mechanism has been improved by @merrymercy to reduce the variants, before publishing the network tuning tutorials, so mean became a reliable metric again. We just didn’t change median back to mean in the operator tuning tutorials.