[MetaSchedule] [TensorCore]Please help check whether I use cuda-tensorcore to tune the operator

If we don’t pad input shape to 16, and only the first convolution on input would not use tensorcore? because padding to 16 also brings some cost when doing convolution, which may not be accelerated even using tensorcore.