FLOPs Computation in AutoTVM and Ansor

jaywang · July 10, 2020, 9:05pm

Hi,

I’ve observed that the GFLOPs reported when using either AutoTVM and Ansor is a bit off from what I expected. For example, for a depthwise convolution with (IH = IW = 28, IC = OC = 192, KH = KW = 3, SH = SW = 2), I’m expecting the total flops to be: 192 * 14 * 14 * 3 * 3 * 2 = 677376

However, FLOPs calculated from AutoTVM and Ansor is: 1886976, which is 2.79x larger than the expected number.

I looked into the source code and found that when TVM computes the FLOPs, it will also consider the padding stage. And for the padding stage, it count all the expressions in the if clause. For example, for this node: @tir.if_then_else(((((i2: int32 >= 1) && (i2 < 29)) && (i3: int32 >= 1)) && (i3 < 29)), Data[i0: int32, i1: int32, (i2 - 1), (i3 - 1)], 0f32, dtype=float32)

It counts 7 flops in total.

I think this calculation is a bit confusing and this computation is more likely to be used for evaluating the MIPs metric (instruction per second) rather than FLOPs (floating-point operations per second). This is a minor problem and only caused some confusion when I’m trying to comparing the GFLOPs reported from the tuning log to other benchmarks. But it would be good if we could consider either change the mechanism when computing the GFLOPs or change the metric name from GFLOPs to MIPS to avoid such confusion.

Best,

jaywang · July 10, 2020, 9:07pm

@yidawang @comaniac @merrymercy

comaniac · July 10, 2020, 9:09pm

@haichen @kevinthesun you guys may be interested in this issue as well.

merrymercy · July 12, 2020, 8:12am

The FLOPs reported by autotvm is just a rough estimation and we should not use it as an accurate metric.

It is very hard to compute exact FLOP for a computation DAG. If I remember correctly, the current code counts both float operations and integer operations at top level (i.e. all operations not inside the index expression). Because it is designed to work with both float32 and int8 workloads.

Changing it to MIPS is not very intuitive to me. I would like to hear more opinions. To solve your specific case, we can improve the current code. If the output is a float32 tensor, we can only count the operations with float operands.