According to the TVM paper, the GPU end-to-end evaluation, the TVM speedup about x1 compared with the Tensorflow on LSTM LM, is the LSTM LM referred to PTB model? and Does TVM enable the auto tuning of LSTM cell to do the comparison?
Trying the ptb in the nnvm/tests/python/frontend/tensorflow/test_forward.py, the tensorflow and tvm using almost the same time to run the ptb model, is this expected? or is there anything missing to turn on the LSTM cell optimization?
The LSTM LM in the paper is a standard LSTM language model with num_layers=2, hidden_units = 650, batch_size=4, voc_size=10000, step_size=1.
We did auto-tuning for dense layers, but the related code has not been unstreamed. So the current dense schedule in master branch does not support auto-tuning.
You can try to open optimization flag in test_forward.py
as I mentioned in LSTM block cell fusion support
Thank you merrymercy, turn on opt_level=3 improved the ptb model perf complied with TVM