e2e_opt_model.py has : Memory verification failed error

Hi Experts,

running sample program from End-to-End Optimize Model — tvm 0.21.dev0 documentation has following error.

LOG(FATAL) << "RuntimeError: Memory verification failed with the following errors:\n"

Look like missing DL transform as following, am I correct?

from tvm import dlight as dl

with tvm.target.Target("cuda"):
    gpu_mod = dl.ApplyDefaultSchedule(
        dl.gpu.Matmul(),
        dl.gpu.Fallback(),
    )(mod)

Have a nice day! Ruby

This tutorial enables auto-tuning, so it does not depend on DL transform. Please make sure you have tuned enough trials :slight_smile:

Hello, @Hzfengsy. I have the same problem with @ruby.chou and want to ask a related question.

Does your reply mean that memory verification failure would disappear if I try more trials?

If so, can you give me a tip on how to check that trials are enough and how to set a proper number of trials?

You are right.

Make sure all operators in the network have been tuned and got a valid schedule (no N/A in the log tables)

Thanks for replying, I modify tunning time to 10000 times, but still has this error.

following are paritial logs.

Total trials: 10040 Total latency (us): 27354.3

2025-05-29 10:41:18 [INFO] [task_scheduler.cc:260] Task #27 has finished. Remaining task(s): 0 2025-05-29 10:41:18 [INFO] [task_scheduler.cc:320] ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done

0 | fused_matmul_add13 | 1025000 | 1 | 8.3234 | 123.1465 | 123.1465 | 64 | Y 1 | reshape5 | 1 | 1 | 0.0000 | 53.5332 | 53.5332 | 5 | Y 2 | fused_reshape4_add12_relu4 | 50176 | 1 | 0.9193 | 54.5822 | 54.5822 | 6 | Y 3 | fused_conv2d10_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 12945408 | 1 | 63.1011 | 205.1534 | 205.1534 | 128 | Y 4 | fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 231311360 | 3 | 158.0187 | 1463.8231 | 4391.4693 | 1536 | Y 5 | fused_reshape4_relu4 | 25088 | 2 | 0.4583 | 54.7361 | 109.4723 | 6 | Y 6 | transpose | 1 | 1 | 0.0000 | 993.6313 | 993.6313 | 1 | Y 7 | fused_conv2d8_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 115705856 | 1 | 97.9469 | 1181.3126 | 1181.3126 | 448 | Y 8 | fused_reshape3_add9_relu3 | 100352 | 1 | 1.8730 | 53.5772 | 53.5772 | 6 | Y 9 | fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 231411712 | 3 | 167.6884 | 1380.0099 | 4140.0296 | 1471 | Y 10 | adaptive_avg_pool2d | 25600 | 1 | 0.4790 | 53.4449 | 53.4449 | 62 | Y 11 | fused_reshape3_relu3 | 50176 | 2 | 1.0084 | 49.7598 | 99.5197 | 6 | Y 12 | fused_conv2d7_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 13045760 | 1 | 57.3716 | 227.3904 | 227.3904 | 128 | Y 13 | fused_conv2d_subtract_divide_expand_dims_multiply_expand_dims_add1_strided_slice | 239239168 | 1 | 95.8125 | 2496.9516 | 2496.9516 | 1279 | Y 14 | fused_reshape4_reshape4_add12_relu4 | 50176 | 1 | N/A | N/A | N/A | 6 | Y 15 | fused_reshape_relu | 802816 | 1 | N/A | N/A | N/A | 1 | Y 16 | fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1 | 232013824 | 4 | 154.9151 | 1497.6838 | 5990.7353 | 2240 | Y 17 | fused_reshape3_reshape3_add9_relu3 | 100352 | 1 | 2.0487 | 48.9840 | 48.9840 | 6 | Y 18 | fused_reshape1_relu1 | 200704 | 2 | 2.0253 | 99.1005 | 198.2010 | 6 | Y 19 | max_pool2d | 1806336 | 1 | 7.0835 | 255.0051 | 255.0051 | 70 | Y 20 | fused_reshape2_relu2 | 100352 | 2 | 1.9167 | 52.3576 | 104.7152 | 6 | Y 21 | fused_conv2d2_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 116006912 | 1 | 107.3230 | 1080.9135 | 1080.9135 | 448 | Y 22 | fused_reshape1_add3_relu1 | 401408 | 2 | 3.1649 | 126.8302 | 253.6605 | 6 | Y 23 | fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 231612416 | 3 | 173.6675 | 1333.6546 | 4000.9637 | 1536 | Y 24 | fused_conv2d4_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 13246464 | 1 | 39.1577 | 338.2850 | 338.2850 | 192 | Y 25 | fused_reshape2_reshape2_add6_relu2 | 200704 | 1 | N/A | N/A | N/A | 6 | Y 26 | fused_reshape2_add6_relu2 | 200704 | 1 | N/A | N/A | N/A | 6 | Y 27 | fused_conv2d5_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 115806208 | 1 | 128.7318 | 899.5931 | 899.5931 | 384 | Y

Total trials: 10040 Total latency (us): 27354.3

[10:41:18] /home/ruby/tvm/src/relax/transform/meta_schedule.cc:119: Warning: Creating JSONDatabase. Workload at: tuning_logs/database_workload.json, Tuning records at: tuning_logs/database_tuning_record.json /home/ruby/tvm/python/tvm/script/highlight.py:233: UserWarning: No module named ‘pygments’ To print highlighted TVM script, please install Pygments>=2.4.0: /home/ruby/anaconda3/envs/tvm_test/bin/python -m pip install “Pygments>=2.4.0” --upgrade --user warnings.warn(

from tvm.script import relax as R

@R.function def main(x: R.Tensor((1, 3, 224, 224), dtype=“float32”), p_conv1_weight: R.Tensor((64, 3, 7, 7), dtype=“float32”), p_bn1_weight: R.Tensor((64,), dtype=“float32”), p_bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___conv1_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___0___bn1_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___conv2_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___0___bn2_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___bn2_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___conv1_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___1___bn1_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___conv2_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___1___bn2_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___bn2_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer2___0___conv1_weight: R.Tensor((128, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___0___bn1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___bn1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___conv2_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___0___bn2_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___bn2_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_0_weight: R.Tensor((128, 64, 1, 1), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___conv1_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___1___bn1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___bn1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___conv2_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___1___bn2_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___bn2_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer3___0___conv1_weight: R.Tensor((256, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___0___bn1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___bn1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___conv2_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___0___bn2_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___bn2_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_0_weight: R.Tensor((256, 128, 1, 1), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___conv1_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___1___bn1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___bn1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___conv2_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___1___bn2_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___bn2_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer4___0___conv1_weight: R.Tensor((512, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___0___bn1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___bn1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___conv2_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___0___bn2_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___bn2_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_0_weight: R.Tensor((512, 256, 1, 1), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___conv1_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___1___bn1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___bn1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___conv2_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___1___bn2_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___bn2_bias: R.Tensor((512,), dtype=“float32”), p_fc_weight: R.Tensor((1000, 512), dtype=“float32”), p_fc_bias: R.Tensor((1000,), dtype=“float32”)) → R.Tuple(R.Tensor((1, 1000), dtype=“float32”)): R.func_attr({“num_input”: 1}) with R.dataflow(): lv = R.call_tir(fused_conv2d_subtract_divide_expand_dims_multiply_expand_dims_add1_strided_slice, (x, p_conv1_weight, metadata[“relax.expr.Constant”][0], metadata[“relax.expr.Constant”][1], p_bn1_weight, p_bn1_bias), out_sinfo=R.Tensor((1, 64, 112, 112), dtype=“float32”)) lv1 = R.call_tir(fused_reshape_relu, (lv,), out_sinfo=R.Tensor((1, 64, 112, 112), dtype=“float32”)) lv6 = R.call_tir(max_pool2d, (lv1,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv2 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv6, p_getattr_l__self___layer1___0___conv1_weight, metadata[“relax.expr.Constant”][2], metadata[“relax.expr.Constant”][3], p_getattr_l__self___layer1___0___bn1_weight, p_getattr_l__self___layer1___0___bn1_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv3 = R.call_tir(fused_reshape1_relu1, (lv2,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv4 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv3, p_getattr_l__self___layer1___0___conv2_weight, metadata[“relax.expr.Constant”][4], metadata[“relax.expr.Constant”][5], p_getattr_l__self___layer1___0___bn2_weight, p_getattr_l__self___layer1___0___bn2_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv5 = R.call_tir(fused_reshape1_add3_relu1, (lv4, lv6), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv6_1 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv5, p_getattr_l__self___layer1___1___conv1_weight, metadata[“relax.expr.Constant”][6], metadata[“relax.expr.Constant”][7], p_getattr_l__self___layer1___1___bn1_weight, p_getattr_l__self___layer1___1___bn1_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv7 = R.call_tir(fused_reshape1_relu1, (lv6_1,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv8 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv7, p_getattr_l__self___layer1___1___conv2_weight, metadata[“relax.expr.Constant”][8], metadata[“relax.expr.Constant”][9], p_getattr_l__self___layer1___1___bn2_weight, p_getattr_l__self___layer1___1___bn2_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv9 = R.call_tir(fused_reshape1_add3_relu1, (lv8, lv5), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”)) lv10 = R.call_tir(fused_conv2d2_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv9, p_getattr_l__self___layer2___0___conv1_weight, metadata[“relax.expr.Constant”][10], metadata[“relax.expr.Constant”][11], p_getattr_l__self___layer2___0___bn1_weight, p_getattr_l__self___layer2___0___bn1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv11 = R.call_tir(fused_reshape2_relu2, (lv10,), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv12 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv11, p_getattr_l__self___layer2___0___conv2_weight, metadata[“relax.expr.Constant”][12], metadata[“relax.expr.Constant”][13], p_getattr_l__self___layer2___0___bn2_weight, p_getattr_l__self___layer2___0___bn2_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv13 = R.call_tir(fused_conv2d4_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv9, p_getattr_l__self___layer2___0___downsample_0_weight, metadata[“relax.expr.Constant”][14], metadata[“relax.expr.Constant”][15], p_getattr_l__self___layer2___0___downsample_1_weight, p_getattr_l__self___layer2___0___downsample_1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv14 = R.call_tir(fused_reshape2_reshape2_add6_relu2, (lv12, lv13), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv15 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv14, p_getattr_l__self___layer2___1___conv1_weight, metadata[“relax.expr.Constant”][16], metadata[“relax.expr.Constant”][17], p_getattr_l__self___layer2___1___bn1_weight, p_getattr_l__self___layer2___1___bn1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv16 = R.call_tir(fused_reshape2_relu2, (lv15,), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv17 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv16, p_getattr_l__self___layer2___1___conv2_weight, metadata[“relax.expr.Constant”][18], metadata[“relax.expr.Constant”][19], p_getattr_l__self___layer2___1___bn2_weight, p_getattr_l__self___layer2___1___bn2_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv18 = R.call_tir(fused_reshape2_add6_relu2, (lv17, lv14), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”)) lv19 = R.call_tir(fused_conv2d5_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv18, p_getattr_l__self___layer3___0___conv1_weight, metadata[“relax.expr.Constant”][20], metadata[“relax.expr.Constant”][21], p_getattr_l__self___layer3___0___bn1_weight, p_getattr_l__self___layer3___0___bn1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv20 = R.call_tir(fused_reshape3_relu3, (lv19,), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv21 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv20, p_getattr_l__self___layer3___0___conv2_weight, metadata[“relax.expr.Constant”][22], metadata[“relax.expr.Constant”][23], p_getattr_l__self___layer3___0___bn2_weight, p_getattr_l__self___layer3___0___bn2_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv22 = R.call_tir(fused_conv2d7_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv18, p_getattr_l__self___layer3___0___downsample_0_weight, metadata[“relax.expr.Constant”][24], metadata[“relax.expr.Constant”][25], p_getattr_l__self___layer3___0___downsample_1_weight, p_getattr_l__self___layer3___0___downsample_1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv23 = R.call_tir(fused_reshape3_reshape3_add9_relu3, (lv21, lv22), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv24 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv23, p_getattr_l__self___layer3___1___conv1_weight, metadata[“relax.expr.Constant”][26], metadata[“relax.expr.Constant”][27], p_getattr_l__self___layer3___1___bn1_weight, p_getattr_l__self___layer3___1___bn1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv25 = R.call_tir(fused_reshape3_relu3, (lv24,), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv26 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv25, p_getattr_l__self___layer3___1___conv2_weight, metadata[“relax.expr.Constant”][28], metadata[“relax.expr.Constant”][29], p_getattr_l__self___layer3___1___bn2_weight, p_getattr_l__self___layer3___1___bn2_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv27 = R.call_tir(fused_reshape3_add9_relu3, (lv26, lv23), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”)) lv28 = R.call_tir(fused_conv2d8_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv27, p_getattr_l__self___layer4___0___conv1_weight, metadata[“relax.expr.Constant”][30], metadata[“relax.expr.Constant”][31], p_getattr_l__self___layer4___0___bn1_weight, p_getattr_l__self___layer4___0___bn1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv29 = R.call_tir(fused_reshape4_relu4, (lv28,), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv30 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv29, p_getattr_l__self___layer4___0___conv2_weight, metadata[“relax.expr.Constant”][32], metadata[“relax.expr.Constant”][33], p_getattr_l__self___layer4___0___bn2_weight, p_getattr_l__self___layer4___0___bn2_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv31 = R.call_tir(fused_conv2d10_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv27, p_getattr_l__self___layer4___0___downsample_0_weight, metadata[“relax.expr.Constant”][34], metadata[“relax.expr.Constant”][35], p_getattr_l__self___layer4___0___downsample_1_weight, p_getattr_l__self___layer4___0___downsample_1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv32 = R.call_tir(fused_reshape4_reshape4_add12_relu4, (lv30, lv31), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv33 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv32, p_getattr_l__self___layer4___1___conv1_weight, metadata[“relax.expr.Constant”][36], metadata[“relax.expr.Constant”][37], p_getattr_l__self___layer4___1___bn1_weight, p_getattr_l__self___layer4___1___bn1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv34 = R.call_tir(fused_reshape4_relu4, (lv33,), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv35 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv34, p_getattr_l__self___layer4___1___conv2_weight, metadata[“relax.expr.Constant”][38], metadata[“relax.expr.Constant”][39], p_getattr_l__self___layer4___1___bn2_weight, p_getattr_l__self___layer4___1___bn2_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv36 = R.call_tir(fused_reshape4_add12_relu4, (lv35, lv32), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”)) lv126 = R.call_tir(adaptive_avg_pool2d, (lv36,), out_sinfo=R.Tensor((1, 512, 1, 1), dtype=“float32”)) lv127 = R.call_tir(reshape5, (lv126,), out_sinfo=R.Tensor((1, 512), dtype=“float32”)) lv128 = R.call_tir(transpose, (p_fc_weight,), out_sinfo=R.Tensor((512, 1000), dtype=“float32”)) lv37 = R.call_tir(fused_matmul_add13, (lv127, lv128, p_fc_bias), out_sinfo=R.Tensor((1, 1000), dtype=“float32”)) gv: R.Tuple(R.Tensor((1, 1000), dtype=“float32”)) = (lv37,) R.output(gv) return gv

Metadata omitted. Use show_meta=True in script() method to show it.

Traceback (most recent call last): File “/home/ruby/tvm/test/e2e_opt_model.py”, line 117, in ex = tvm.compile(mod, target=“cuda”) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/home/ruby/tvm/python/tvm/driver/build_module.py”, line 104, in compile return tvm.relax.build( ^^^^^^^^^^^^^^^^ File “/home/ruby/tvm/python/tvm/relax/vm_build.py”, line 259, in build return _vmlink( ^^^^^^^^ File “/home/ruby/tvm/python/tvm/relax/vm_build.py”, line 154, in _vmlink lib = tvm.tir.build(tir_mod, target=target, pipeline=tir_pipeline) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/home/ruby/tvm/python/tvm/tir/build.py”, line 173, in build mod = pipeline(mod) ^^^^^^^^^^^^^ File “/home/ruby/tvm/python/tvm/ir/transform.py”, line 238, in call return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “tvm/_ffi/_cython/./packed_func.pxi”, line 339, in tvm._ffi._cy3.core.PackedFuncBase.call File “tvm/_ffi/_cython/./packed_func.pxi”, line 270, in tvm._ffi._cy3.core.FuncCall File “tvm/_ffi/_cython/./packed_func.pxi”, line 259, in tvm._ffi._cy3.core.FuncCall3 File “tvm/_ffi/_cython/./base.pxi”, line 185, in tvm._ffi._cy3.core.CHECK_CALL File “/home/ruby/tvm/python/tvm/_ffi/base.py”, line 468, in raise_last_ffi_error raise py_err File “tvm/_ffi/_cython/./packed_func.pxi”, line 56, in tvm._ffi._cy3.core.tvm_callback File “/home/ruby/tvm/python/tvm/tir/pipeline.py”, line 122, in _pipeline mod = tvm.ir.transform.Sequential(passes)(mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/home/ruby/tvm/python/tvm/ir/transform.py”, line 238, in call return _ffi_transform_api.RunPass(self, mod) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “tvm/_ffi/_cython/./packed_func.pxi”, line 339, in tvm._ffi._cy3.core.PackedFuncBase.call File “tvm/_ffi/_cython/./packed_func.pxi”, line 270, in tvm._ffi._cy3.core.FuncCall File “tvm/_ffi/_cython/./packed_func.pxi”, line 259, in tvm._ffi._cy3.core.FuncCall3 File “tvm/_ffi/_cython/./base.pxi”, line 185, in tvm._ffi._cy3.core.CHECK_CALL File “/home/ruby/tvm/src/tir/analysis/verify_memory.cc”, line 203, in operator() LOG(FATAL) << “RuntimeError: Memory verification failed with the following errors:\n” ^^^^^^^^^^^^^^^^^^^^^^^^^^^ tvm._ffi.base.TVMError: Traceback (most recent call last): 0: operator() at /home/ruby/tvm/src/tir/analysis/verify_memory.cc:203 Did you forget to bind? Variable compute is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable lv50 is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable lv60 is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. File “/home/ruby/tvm/src/tir/analysis/verify_memory.cc”, line 203 RuntimeError: Memory verification failed with the following errors:

from tvm.script import tir as T

@T.prim_func def fused_reshape2_add6_relu2(lv60: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”), lv50: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”), compute_intermediate: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”)): T.func_attr({“target”: T.target({“arch”: “sm_53”, “host”: {“keys”: [“arm_cpu”, “cpu”], “kind”: “llvm”, “mtriple”: “aarch64-unknown-linux-gnu”, “tag”: “”}, “keys”: [“cuda”, “gpu”], “kind”: “cuda”, “max_num_threads”: 1024, “tag”: “”, “thread_warp_size”: 32}), “tir.noalias”: T.bool(True)}) T_reshape = T.allocate([100352], “float32”, “global”) T_reshape_1 = T.Buffer((T.int64(100352),), data=T_reshape) for ax1, ax2, ax3 in T.grid(128, 28, 28): cse_var_1: T.int32 = ax1 * 784 + ax2 * 28 + ax3 lv60_1 = T.Buffer((T.int64(100352),), data=lv60.data) T_reshape_1[cse_var_1] = lv60_1[cse_var_1] T_reshape_2 = T.Buffer((T.int64(100352),), data=T_reshape) for ax1, ax2, ax3 in T.grid(128, 28, 28): cse_var_2: T.int32 = ax1 * 784 + ax2 * 28 + ax3 lv50_1 = T.Buffer((T.int64(100352),), data=lv50.data) T_reshape_2[cse_var_2] = T_reshape_1[cse_var_2] + lv50_1[cse_var_2] for i1, i2, i3 in T.grid(128, 28, 28): cse_var_3: T.int32 = i1 * 784 + i2 * 28 + i3 compute_intermediate_1 = T.Buffer((T.int64(100352),), data=compute_intermediate.data) compute_intermediate_1[cse_var_3] = T.max(T_reshape_2[cse_var_3], T.float32(0.0))

real 1052m59.636s user 3834m21.524s sys 125m26.492s

I also got same problem. (In my case, n/a tasks are not tried altough I set higher trials).

However, the model passed the memory verificaiton error after I forced to tune n/a tasks. (With using previously tuned log).

But a new problem is occured. The latency of tunned resnet18 is almost same with pytorch version resnet18.

@Hzfengsy Could you tell me the reason (I already tried 10,000 trials)? As I know, tuned resnet18 should performs much better than pytorch version model as shown in the TVM papers.