Hello everyone, when I tried the quantization tool in TVM, I found that it roughly includes four processes. But I don’t understand why it is necessary to partition the model first, and the role of inserting stop_fusion op into the model. And I tried to quantize resnet18 without partition, and the model output obtained was very close to that with partition.
def test_parition():
mod, params = testing.resnet.get_workload()
partition_mod = relay.quantize.quantize(mod, params)
no_partition_mod = relay.quantize.quantize(mod, params, use_partition=False)
target = tvm.target.Target("llvm")
partition_runtime = get_runtime(partition_mod, target)
no_partition_runtime = get_runtime(no_partition_mod, target)
input = np.random.uniform(size=(1,3,224,224))
partition_output = get_output(partition_runtime, input, 0)
no_partition_output = get_output(no_partition_runtime, input, 0)
compare_result(partition_output, no_partition_output)
result is
def compare_result(arr1, arr2):
print(np.sum((arr1-arr2)**2)/len(arr1))
5.208057700656354e-07