Few question regarding autotvm and tune_relay_x86

Unfortunately some of the parameters for the runtime and tuning can be confusing, but I’ll try to clarify those here:

  1. This controls the number of threads that will be used by the kernel implementation of each operator, not the amount of parallelism used during tuning. You should set this to the number of physical cores on your machine, or the number of physical cores that you want to use to perform inference. You should observe a near-linear speedup as you increase this number.
  2. A task can be thought of as a specific instance of an operator (e.g., conv2d, dense) with a specific shape (input size, window size, stride, etc.). Even though ResNet-18 has 18 layers, some layers may be identical from this perspective, so we can have fewer “tasks” than layers. x86 is a special case as there is a data layout transformation also defined. The comment you are referring to is about converting the data layout of 12 conv2d resnet-18 tasks to 12 conv2d tasks with a different data layout (in this case NCHWc).
  3. Progress is the number of measurements tried on real hardware out of the total number of measurements allotted (this is the n_trial) parameter in tuning_option. The last value is the amount of elapsed wall-clock time while tuning this task.
    3a, see above
    3b, the configuration space (different for each hardware backend) can be found in topi/python/your_hardware_backend. In this case it would be topi/python/x86.
1 Like