I try to quantize resnet18 to w16a16 and w8a8, but both of them use almost the same memory as fp32, and there isn’t much improvement in inference time. I’m using PyTorch frontend and graph executor.
Thanks so much for your time!
I try to quantize resnet18 to w16a16 and w8a8, but both of them use almost the same memory as fp32, and there isn’t much improvement in inference time. I’m using PyTorch frontend and graph executor.
Thanks so much for your time!