As in this PR, we propose to add HashTable support into TVM.
We observe a great amount of TensorFlow NLP models used in our production environment, and we would like to improve model serving speed using TVM. Among these models, a lot of them incorporate HashTable related operators for word/character hashing, convert DT_STRING to integer type, and do further embedding/feature extraction in later steps.
In order to support HashTable related operations in TVM, four basic operators in TensorFlow need to be supported:
We consider four main issues in the proposed implementation:
How to represent the HashTable object?
We build a HashTable class and assign its pointer to the data of a DLTensor, so as to make a DLTensor with HashTable element. By doing this, we can smoothly inherit the current TVM structure where operators interact through DLTensors. In TensorFlow graph, the HashTable appears at the input and output tensors of the HashTable related operators. Hence, this proposed solution makes it natural to import from the TensorFlow graph. Moreover, by embedding HashTable inside DLTensor, some existing DLTensor related methods can be inherited, such as setting up tensors, passing tensors as arguments and even the AllocTensor instruction in VM. We build the HashTable class as an external library in src/runtime/contrib like the sort library.
How to handle the tensors with string type?
For HashTable operations, the input and output tensors are usually with string type. Currently, TVM doesn’t support string type data inside DLTensors. We assign the pointer of c++ class std::string to the data of a DLTensor, to make a string type tensor. Although TVM currently supports using Object class to pass an individual string data, but it doesn’t support an array of strings inside a DLTensor. Also, we believe it will not be a good way to further import Object class inside the DLTensor class.
How to handle the table initializer operators?
An important issue for HashTable related operators is how to do table initialization. The TensorFlow operators LookupTableImportV2 and InitializeTableFromTextFileV2 are designed for this purpose. In TensorFlow, an init run “tf.table_initializer()” is required before executing the model. Similar to TensorFlow, we add an init run into the TVM graph runtime, the ops for initialization will be executed in this init run. The init run will be executed with the “set_input(**params)” when setting up the parameters of the graph runtime. Then in the regular run, only the other ops will be executed. To support VM runtime, we think an “init” function which consists of initialization ops can be invoked in the same way before invoking the “main” model function, though currently we only implement the graph runtime support.
Alternatively, we can treat the HashTable object as a constant parameter. When parsing the model, we can make the HashTable object and complete the required initialization. Then the HashTable will be a constant variable. In this way, both the graph runtime and VM runtime can avoid being modified and only the table lookup operators like LookupTableFindV2 will remain in the model. But this solution will make it hard to handle possible run-time modifications to the HashTable, thus is less flexible. Also, we need to handle the initialization ops before performing model serving.
How to transform between Numpy Array and NDArray with string elements?
The inputs and params should be transformed from Numpy Array to the DLTensor in the NDArray in TVM. “CopyFrom” and “AsNumpy” methods of NDArray handle the transformation. But for the array with strings, the original transformation cannot work. We should add new transformation methods for string arrays, which is the reason why new “_ffi” functions beside “TVMArrayCopyFromBytes” and “TVMArrayCopyToBytes” are needed.
In addition, to add new datatypes for DLTensors, we rely on the recommended way of adding custom data types as in this PR. Our proposed implementation uses the recommended way of adding external operations. The HashTable related operations are first built as an external library, which is invoked by the TOPI layer to form operators. The TensorFlow frontend parses HashTable related operators from TensorFlow graph as registered Relay operators. Besides graph runtime, such an implementation of ops can also smoothly fit the VM runtime with the DLTensor based HashTable and string types, since the ops can be handled by the InvokePacked instruction and the new data types are still NDArray object with DLTensor in VM.