The time measured by tvm is much smaller than the cuda code merged in tensorflow

ok, I will try. I reference this post https://www.leiphone.com/news/201803/gHG5G6cCXBrzxjlu.html to speed up our transformer inference.