To workaround the memory explosion issue, you can try calibrate_chunk_by
option at
# current_target = None
relay.quantize.quantize(mod, params, dataset)
def test_calibrate_memory_bound():
mod, params = testing.synthetic.get_workload()
dataset = get_calibration_dataset(mod, "data")
import multiprocessing
num_cpu = multiprocessing.cpu_count()
with relay.quantize.qconfig(calibrate_mode="kl_divergence", calibrate_chunk_by=num_cpu):
relay.quantize.quantize(mod, params, dataset)
def test_calibrate_percentile():
mod, params = testing.synthetic.get_workload()
dataset = get_calibration_dataset(mod, "data")
with relay.quantize.qconfig(calibrate_mode="percentile"):
relay.quantize.quantize(mod, params, dataset)
For the error, you can try running relay.transform.FoldConstant()
before quantize. The weight needs to be a constant to quantize, but it seems it is not in your model.
Not that the existing quantization functionality in TVM is very limited and not actively developed or maintained. There is a new proposal to rework our quantization support in [RFC][Quantization] A new quantization framework in TVM: initial RFC (1/4) .