I am trying to use tvm to speed up albert model inference on cpu, my test result is that tvm consume about 500ms while pytorch consume about 70ms, I also tried autotuning, but not much improvement, why tvm is so slower? profiling result show that op named tvmgen_default_fused_reshape_transpose_einsum_add_add consume more than 90%.
Our support for einsum is quite limited, things run but I wouldn’t be surprised to hear that it is super slow, especially if you are running on GPU.