W16a16 quantization and w8a8 quantization use the same memory as fp32

guoriyue · June 9, 2023, 4:49am

I try to quantize resnet18 to w16a16 and w8a8, but both of them use almost the same memory as fp32, and there isn’t much improvement in inference time. I’m using PyTorch frontend and graph executor.

Thanks so much for your time!