[RFC][TFLite frontend] Create models for frontend testing by directly writing TFLite buffers

Motivation

To be able to test TFLite frontend, we need to generate one operator models that can then be run with TVM. Currently these models are generated either by creating a model in TensorFlow 1 and converting it to TFLite or converting a Keras model. Besides the problems of the TensorFlow APIs not being stable and the conversion being “a black box”, there are also number of operators that can’t be created using the current infrastructure. Let’s look at quantized ABS as an example.

Currently quantized ABS is not supported in the frontend (the compilation would fail with an error if we attempted it). If we wanted to support it, testing would become a problem - TensorFlow 1 way of converting would not work since it can only produce a model converted into UINT8 (but quantized ABS is not supported in UINT8) and there is no standalone ABS operator in Keras.

Even if there was a way to make a quantized TFLite ABS through some TesorFlow/Keras conversion mechanism, it could not be generalised to other operators. We currently need to approach each operator case by case basis and find a different route for each.

The proposal

We can make any kind of TFLite model by directly writing the TFLite buffers. We have been successfully using flatbuffers with TFLite schema to create a large variety of operators in a consistent way. The simplest way to do it is to put together a JSON file that specifies the operators, tensors, quantization parameters and other relevant information and compile it with flatc, a flatbuffers command line tool, which would then produce a .tflite file that can be read into the test. There’s an example of that JSON at the bottom of the RFC.

The idea is to upstream some reusable classes and functions that would make it simple to create the JSON files and compile them with flatc. We don’t intend to rewrite all the current tests, just to add a parallel path for creating TFLite models for testing purposes.

Discussion

There are two ways of directly creating TFLite buffers:

B1 Use the flatc command line tool to compile the TFLite schema together with the JSON:

flatc -b schema_path json_path

The output of this is a .tflite file. The problem with this approach is that flatc does not have a Python API so we can’t serialize the JSON into a TFLite buffer without doing a subprocess call. We would need to download the schema, call the flatc command line tool which will dump a .tflite file to a (temporary) directory and then we’d need to read that file in again. The benefit of that approach is that putting together that JSON file is relatively straightforward; in the presence of helper functions creating new models becomes quite simple.

B2 Flatc can be used to compile a schema file into a Python library which in turn can be used to create the serialized TFLite buffers. In fact, that kind of library is already used in TFLite frontend to deserialize the TFLite buffers. The problem with these Python APIs is that it is not immediately obvious how to use them to construct a model, especially for more complex operators with several input tensors.

We have implemented the JSON route since we found this to be easier for creating and debugging models, but it would be interesting to know what the community thinks about it.

An example of the JSON:

{
  "version": 3,
  "description": "test network",
  "operator_codes": [
    {
      "builtin_code": 101
    }
  ],
  "subgraphs": [
    {
      "tensors": [
        {
          "type": "INT8",
          "buffer": 1,
          "name": "tensor-0",
          "shape": [3],
          "quantization": {
            "scale": [0.5],
            "zero_point": [0],
            "quantized_dimension": 0
          }
        },
        {
          "type": "INT8",
          "buffer": 2,
          "name": "tensor-1",
          "quantization": {
            "scale": [0.5],
            "zero_point": [0],
            "quantized_dimension": 0
          }
        }
      ],
      "inputs": [0],
      "outputs": [1],
      "operators": [
        {
          "opcode_index": 0,
          "inputs": [0],
          "outputs": [1],
          "mutating_variable_inputs": [],
          "builtin_options_type": "AbsOptions",
          "builtin_options": {}
        }
      ]
    }
  ],
  "buffers": [{"data": []}, {"data": []}, {"data": []}]
}
1 Like

@anijain2305 @tqchen @dmitriy-arm

@siju-samuel @FrozenGene you may also be interested in this proposal.

Thanks for the proposal. As this way is one supplement for the operator we can not build easily, so we should consider which way we could control it more easier and deeper. From my opinion, I would like vote for B1. B2 is more like TFLite frontend, however, I think it is not the same story. TFLite frontend is we want to parse and want to follow the way of other frontend (Python API’s way), but this is for the constructing complex operator and model, we don’t have to follow TFLite frontend way, we should consider how to construct more easier.

1 Like

There’s the PR - [TFLite][Testing] Add infra to write TFLite model buffers directly by ekalda · Pull Request #8368 · apache/tvm · GitHub