I use the bmm from the blog:Redirecting…, but the tvm’s bmm is slower than pytorch’s. the results from nvvp is show blew.left is pytorch and right is tvm.
Is there a better tvm’s bmm for attention used in transformer?I use the bmm from the blog:Redirecting…, but the tvm’s bmm is slower than pytorch’s. the results from nvvp is show blew.left is pytorch and right is tvm.
Is there a better tvm’s bmm for attention used in transformer?