Quantized Transformer

Thanks for the reply.

  • PyTorch → Relay → Ansor → TVM’s low-level code → LLVM/NVCC (LLVM was used above)
  • Both CPU and GPU (in particular, NVIDIA T4)