I’ve been implementing the TFLite PostProcess Detection operator (which is used in SSD Mobilenet) (see https://github.com/apache/incubator-tvm/pull/4543/). However, there is a difference between the results of tflite and tvm for qnn graphs that I think is due to a difference in rounding scheme (and potentially operating lowering).
For most operators this effect is not too significant as we can write tests with a +/- 1 tolerance on the outputs. However, part of this custom op sorts the detected objects by confidence and then takes only the top ‘n’ results. This means only a small difference in the outputs can result in a significant difference in the output tensor as the order of the detections is different. This is particularly noticeable when it causes a different detection to get clipped as this results in different information in the output tensors between tvm and tflite.
Writing end-to-end tests for this case is therefore quite difficult and it would be preferable if we could run tvm in a ‘tflite’ mode where it used an identical rounding scheme (and op implementations if necessary). I note that @FrozenGene is looking into this and am just posting this as an example of where bit exact computations would be valuable. Do we have an idea of what would be required to support this behaviour?