[RFC][Tensor Core] Optimization of CNNs on Tensor Core

Thank you @Shawn_Inspur. It is very impressive and welcomed work!

Together with @Shawn_Inspur and his teammates, we have brought the Tensor Core support into TOPI and Relay. The current performance is much better than current workloads in TVM (without Tensor Cores) and TensorFlow (with Tensor Cores) but still slower than TensorRT. Optimization is still in progress. The code will be ready in 1-2 weeks and Shawn will publish the PR then.

It would be great to see the discussion about open questions. Also, any comments are welcomed!

cc @Laurawly @vinx13 @masahi