Thanks for replying, I modify tunning time to 10000 times, but still has this error.
following are paritial logs.
Total trials: 10040
Total latency (us): 27354.3
2025-05-29 10:41:18 [INFO] [task_scheduler.cc:260] Task #27 has finished. Remaining task(s): 0
2025-05-29 10:41:18 [INFO] [task_scheduler.cc:320]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
0 | fused_matmul_add13 | 1025000 | 1 | 8.3234 | 123.1465 | 123.1465 | 64 | Y
1 | reshape5 | 1 | 1 | 0.0000 | 53.5332 | 53.5332 | 5 | Y
2 | fused_reshape4_add12_relu4 | 50176 | 1 | 0.9193 | 54.5822 | 54.5822 | 6 | Y
3 | fused_conv2d10_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 12945408 | 1 | 63.1011 | 205.1534 | 205.1534 | 128 | Y
4 | fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 231311360 | 3 | 158.0187 | 1463.8231 | 4391.4693 | 1536 | Y
5 | fused_reshape4_relu4 | 25088 | 2 | 0.4583 | 54.7361 | 109.4723 | 6 | Y
6 | transpose | 1 | 1 | 0.0000 | 993.6313 | 993.6313 | 1 | Y
7 | fused_conv2d8_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4 | 115705856 | 1 | 97.9469 | 1181.3126 | 1181.3126 | 448 | Y
8 | fused_reshape3_add9_relu3 | 100352 | 1 | 1.8730 | 53.5772 | 53.5772 | 6 | Y
9 | fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 231411712 | 3 | 167.6884 | 1380.0099 | 4140.0296 | 1471 | Y
10 | adaptive_avg_pool2d | 25600 | 1 | 0.4790 | 53.4449 | 53.4449 | 62 | Y
11 | fused_reshape3_relu3 | 50176 | 2 | 1.0084 | 49.7598 | 99.5197 | 6 | Y
12 | fused_conv2d7_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 13045760 | 1 | 57.3716 | 227.3904 | 227.3904 | 128 | Y
13 | fused_conv2d_subtract_divide_expand_dims_multiply_expand_dims_add1_strided_slice | 239239168 | 1 | 95.8125 | 2496.9516 | 2496.9516 | 1279 | Y
14 | fused_reshape4_reshape4_add12_relu4 | 50176 | 1 | N/A | N/A | N/A | 6 | Y
15 | fused_reshape_relu | 802816 | 1 | N/A | N/A | N/A | 1 | Y
16 | fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1 | 232013824 | 4 | 154.9151 | 1497.6838 | 5990.7353 | 2240 | Y
17 | fused_reshape3_reshape3_add9_relu3 | 100352 | 1 | 2.0487 | 48.9840 | 48.9840 | 6 | Y
18 | fused_reshape1_relu1 | 200704 | 2 | 2.0253 | 99.1005 | 198.2010 | 6 | Y
19 | max_pool2d | 1806336 | 1 | 7.0835 | 255.0051 | 255.0051 | 70 | Y
20 | fused_reshape2_relu2 | 100352 | 2 | 1.9167 | 52.3576 | 104.7152 | 6 | Y
21 | fused_conv2d2_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 116006912 | 1 | 107.3230 | 1080.9135 | 1080.9135 | 448 | Y
22 | fused_reshape1_add3_relu1 | 401408 | 2 | 3.1649 | 126.8302 | 253.6605 | 6 | Y
23 | fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 231612416 | 3 | 173.6675 | 1333.6546 | 4000.9637 | 1536 | Y
24 | fused_conv2d4_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2 | 13246464 | 1 | 39.1577 | 338.2850 | 338.2850 | 192 | Y
25 | fused_reshape2_reshape2_add6_relu2 | 200704 | 1 | N/A | N/A | N/A | 6 | Y
26 | fused_reshape2_add6_relu2 | 200704 | 1 | N/A | N/A | N/A | 6 | Y
27 | fused_conv2d5_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3 | 115806208 | 1 | 128.7318 | 899.5931 | 899.5931 | 384 | Y
Total trials: 10040
Total latency (us): 27354.3
[10:41:18] /home/ruby/tvm/src/relax/transform/meta_schedule.cc:119: Warning: Creating JSONDatabase. Workload at: tuning_logs/database_workload.json, Tuning records at: tuning_logs/database_tuning_record.json
/home/ruby/tvm/python/tvm/script/highlight.py:233: UserWarning: No module named ‘pygments’
To print highlighted TVM script, please install Pygments>=2.4.0:
/home/ruby/anaconda3/envs/tvm_test/bin/python -m pip install “Pygments>=2.4.0” --upgrade --user
warnings.warn(
from tvm.script import relax as R
@R.function
def main(x: R.Tensor((1, 3, 224, 224), dtype=“float32”), p_conv1_weight: R.Tensor((64, 3, 7, 7), dtype=“float32”), p_bn1_weight: R.Tensor((64,), dtype=“float32”), p_bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___conv1_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___0___bn1_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___conv2_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___0___bn2_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___0___bn2_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___conv1_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___1___bn1_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___bn1_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___conv2_weight: R.Tensor((64, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer1___1___bn2_weight: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer1___1___bn2_bias: R.Tensor((64,), dtype=“float32”), p_getattr_l__self___layer2___0___conv1_weight: R.Tensor((128, 64, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___0___bn1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___bn1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___conv2_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___0___bn2_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___bn2_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_0_weight: R.Tensor((128, 64, 1, 1), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___0___downsample_1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___conv1_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___1___bn1_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___bn1_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___conv2_weight: R.Tensor((128, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer2___1___bn2_weight: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer2___1___bn2_bias: R.Tensor((128,), dtype=“float32”), p_getattr_l__self___layer3___0___conv1_weight: R.Tensor((256, 128, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___0___bn1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___bn1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___conv2_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___0___bn2_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___bn2_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_0_weight: R.Tensor((256, 128, 1, 1), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___0___downsample_1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___conv1_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___1___bn1_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___bn1_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___conv2_weight: R.Tensor((256, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer3___1___bn2_weight: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer3___1___bn2_bias: R.Tensor((256,), dtype=“float32”), p_getattr_l__self___layer4___0___conv1_weight: R.Tensor((512, 256, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___0___bn1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___bn1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___conv2_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___0___bn2_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___bn2_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_0_weight: R.Tensor((512, 256, 1, 1), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___0___downsample_1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___conv1_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___1___bn1_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___bn1_bias: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___conv2_weight: R.Tensor((512, 512, 3, 3), dtype=“float32”), p_getattr_l__self___layer4___1___bn2_weight: R.Tensor((512,), dtype=“float32”), p_getattr_l__self___layer4___1___bn2_bias: R.Tensor((512,), dtype=“float32”), p_fc_weight: R.Tensor((1000, 512), dtype=“float32”), p_fc_bias: R.Tensor((1000,), dtype=“float32”)) → R.Tuple(R.Tensor((1, 1000), dtype=“float32”)):
R.func_attr({“num_input”: 1})
with R.dataflow():
lv = R.call_tir(fused_conv2d_subtract_divide_expand_dims_multiply_expand_dims_add1_strided_slice, (x, p_conv1_weight, metadata[“relax.expr.Constant”][0], metadata[“relax.expr.Constant”][1], p_bn1_weight, p_bn1_bias), out_sinfo=R.Tensor((1, 64, 112, 112), dtype=“float32”))
lv1 = R.call_tir(fused_reshape_relu, (lv,), out_sinfo=R.Tensor((1, 64, 112, 112), dtype=“float32”))
lv6 = R.call_tir(max_pool2d, (lv1,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv2 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv6, p_getattr_l__self___layer1___0___conv1_weight, metadata[“relax.expr.Constant”][2], metadata[“relax.expr.Constant”][3], p_getattr_l__self___layer1___0___bn1_weight, p_getattr_l__self___layer1___0___bn1_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv3 = R.call_tir(fused_reshape1_relu1, (lv2,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv4 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv3, p_getattr_l__self___layer1___0___conv2_weight, metadata[“relax.expr.Constant”][4], metadata[“relax.expr.Constant”][5], p_getattr_l__self___layer1___0___bn2_weight, p_getattr_l__self___layer1___0___bn2_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv5 = R.call_tir(fused_reshape1_add3_relu1, (lv4, lv6), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv6_1 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv5, p_getattr_l__self___layer1___1___conv1_weight, metadata[“relax.expr.Constant”][6], metadata[“relax.expr.Constant”][7], p_getattr_l__self___layer1___1___bn1_weight, p_getattr_l__self___layer1___1___bn1_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv7 = R.call_tir(fused_reshape1_relu1, (lv6_1,), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv8 = R.call_tir(fused_conv2d1_subtract1_divide1_expand_dims_multiply1_expand_dims_add2_strided_slice1, (lv7, p_getattr_l__self___layer1___1___conv2_weight, metadata[“relax.expr.Constant”][8], metadata[“relax.expr.Constant”][9], p_getattr_l__self___layer1___1___bn2_weight, p_getattr_l__self___layer1___1___bn2_bias), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv9 = R.call_tir(fused_reshape1_add3_relu1, (lv8, lv5), out_sinfo=R.Tensor((1, 64, 56, 56), dtype=“float32”))
lv10 = R.call_tir(fused_conv2d2_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv9, p_getattr_l__self___layer2___0___conv1_weight, metadata[“relax.expr.Constant”][10], metadata[“relax.expr.Constant”][11], p_getattr_l__self___layer2___0___bn1_weight, p_getattr_l__self___layer2___0___bn1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv11 = R.call_tir(fused_reshape2_relu2, (lv10,), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv12 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv11, p_getattr_l__self___layer2___0___conv2_weight, metadata[“relax.expr.Constant”][12], metadata[“relax.expr.Constant”][13], p_getattr_l__self___layer2___0___bn2_weight, p_getattr_l__self___layer2___0___bn2_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv13 = R.call_tir(fused_conv2d4_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv9, p_getattr_l__self___layer2___0___downsample_0_weight, metadata[“relax.expr.Constant”][14], metadata[“relax.expr.Constant”][15], p_getattr_l__self___layer2___0___downsample_1_weight, p_getattr_l__self___layer2___0___downsample_1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv14 = R.call_tir(fused_reshape2_reshape2_add6_relu2, (lv12, lv13), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv15 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv14, p_getattr_l__self___layer2___1___conv1_weight, metadata[“relax.expr.Constant”][16], metadata[“relax.expr.Constant”][17], p_getattr_l__self___layer2___1___bn1_weight, p_getattr_l__self___layer2___1___bn1_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv16 = R.call_tir(fused_reshape2_relu2, (lv15,), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv17 = R.call_tir(fused_conv2d3_subtract2_divide2_expand_dims1_multiply2_expand_dims1_add5_strided_slice2, (lv16, p_getattr_l__self___layer2___1___conv2_weight, metadata[“relax.expr.Constant”][18], metadata[“relax.expr.Constant”][19], p_getattr_l__self___layer2___1___bn2_weight, p_getattr_l__self___layer2___1___bn2_bias), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv18 = R.call_tir(fused_reshape2_add6_relu2, (lv17, lv14), out_sinfo=R.Tensor((1, 128, 28, 28), dtype=“float32”))
lv19 = R.call_tir(fused_conv2d5_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv18, p_getattr_l__self___layer3___0___conv1_weight, metadata[“relax.expr.Constant”][20], metadata[“relax.expr.Constant”][21], p_getattr_l__self___layer3___0___bn1_weight, p_getattr_l__self___layer3___0___bn1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv20 = R.call_tir(fused_reshape3_relu3, (lv19,), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv21 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv20, p_getattr_l__self___layer3___0___conv2_weight, metadata[“relax.expr.Constant”][22], metadata[“relax.expr.Constant”][23], p_getattr_l__self___layer3___0___bn2_weight, p_getattr_l__self___layer3___0___bn2_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv22 = R.call_tir(fused_conv2d7_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv18, p_getattr_l__self___layer3___0___downsample_0_weight, metadata[“relax.expr.Constant”][24], metadata[“relax.expr.Constant”][25], p_getattr_l__self___layer3___0___downsample_1_weight, p_getattr_l__self___layer3___0___downsample_1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv23 = R.call_tir(fused_reshape3_reshape3_add9_relu3, (lv21, lv22), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv24 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv23, p_getattr_l__self___layer3___1___conv1_weight, metadata[“relax.expr.Constant”][26], metadata[“relax.expr.Constant”][27], p_getattr_l__self___layer3___1___bn1_weight, p_getattr_l__self___layer3___1___bn1_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv25 = R.call_tir(fused_reshape3_relu3, (lv24,), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv26 = R.call_tir(fused_conv2d6_subtract3_divide3_expand_dims2_multiply3_expand_dims2_add8_strided_slice3, (lv25, p_getattr_l__self___layer3___1___conv2_weight, metadata[“relax.expr.Constant”][28], metadata[“relax.expr.Constant”][29], p_getattr_l__self___layer3___1___bn2_weight, p_getattr_l__self___layer3___1___bn2_bias), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv27 = R.call_tir(fused_reshape3_add9_relu3, (lv26, lv23), out_sinfo=R.Tensor((1, 256, 14, 14), dtype=“float32”))
lv28 = R.call_tir(fused_conv2d8_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv27, p_getattr_l__self___layer4___0___conv1_weight, metadata[“relax.expr.Constant”][30], metadata[“relax.expr.Constant”][31], p_getattr_l__self___layer4___0___bn1_weight, p_getattr_l__self___layer4___0___bn1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv29 = R.call_tir(fused_reshape4_relu4, (lv28,), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv30 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv29, p_getattr_l__self___layer4___0___conv2_weight, metadata[“relax.expr.Constant”][32], metadata[“relax.expr.Constant”][33], p_getattr_l__self___layer4___0___bn2_weight, p_getattr_l__self___layer4___0___bn2_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv31 = R.call_tir(fused_conv2d10_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv27, p_getattr_l__self___layer4___0___downsample_0_weight, metadata[“relax.expr.Constant”][34], metadata[“relax.expr.Constant”][35], p_getattr_l__self___layer4___0___downsample_1_weight, p_getattr_l__self___layer4___0___downsample_1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv32 = R.call_tir(fused_reshape4_reshape4_add12_relu4, (lv30, lv31), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv33 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv32, p_getattr_l__self___layer4___1___conv1_weight, metadata[“relax.expr.Constant”][36], metadata[“relax.expr.Constant”][37], p_getattr_l__self___layer4___1___bn1_weight, p_getattr_l__self___layer4___1___bn1_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv34 = R.call_tir(fused_reshape4_relu4, (lv33,), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv35 = R.call_tir(fused_conv2d9_subtract4_divide4_expand_dims3_multiply4_expand_dims3_add11_strided_slice4, (lv34, p_getattr_l__self___layer4___1___conv2_weight, metadata[“relax.expr.Constant”][38], metadata[“relax.expr.Constant”][39], p_getattr_l__self___layer4___1___bn2_weight, p_getattr_l__self___layer4___1___bn2_bias), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv36 = R.call_tir(fused_reshape4_add12_relu4, (lv35, lv32), out_sinfo=R.Tensor((1, 512, 7, 7), dtype=“float32”))
lv126 = R.call_tir(adaptive_avg_pool2d, (lv36,), out_sinfo=R.Tensor((1, 512, 1, 1), dtype=“float32”))
lv127 = R.call_tir(reshape5, (lv126,), out_sinfo=R.Tensor((1, 512), dtype=“float32”))
lv128 = R.call_tir(transpose, (p_fc_weight,), out_sinfo=R.Tensor((512, 1000), dtype=“float32”))
lv37 = R.call_tir(fused_matmul_add13, (lv127, lv128, p_fc_bias), out_sinfo=R.Tensor((1, 1000), dtype=“float32”))
gv: R.Tuple(R.Tensor((1, 1000), dtype=“float32”)) = (lv37,)
R.output(gv)
return gv
Metadata omitted. Use show_meta=True in script() method to show it.
Traceback (most recent call last):
File “/home/ruby/tvm/test/e2e_opt_model.py”, line 117, in
ex = tvm.compile(mod, target=“cuda”)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ruby/tvm/python/tvm/driver/build_module.py”, line 104, in compile
return tvm.relax.build(
^^^^^^^^^^^^^^^^
File “/home/ruby/tvm/python/tvm/relax/vm_build.py”, line 259, in build
return _vmlink(
^^^^^^^^
File “/home/ruby/tvm/python/tvm/relax/vm_build.py”, line 154, in _vmlink
lib = tvm.tir.build(tir_mod, target=target, pipeline=tir_pipeline)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ruby/tvm/python/tvm/tir/build.py”, line 173, in build
mod = pipeline(mod)
^^^^^^^^^^^^^
File “/home/ruby/tvm/python/tvm/ir/transform.py”, line 238, in call
return _ffi_transform_api.RunPass(self, mod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “tvm/_ffi/_cython/./packed_func.pxi”, line 339, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 270, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 259, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 185, in tvm._ffi._cy3.core.CHECK_CALL
File “/home/ruby/tvm/python/tvm/_ffi/base.py”, line 468, in raise_last_ffi_error
raise py_err
File “tvm/_ffi/_cython/./packed_func.pxi”, line 56, in tvm._ffi._cy3.core.tvm_callback
File “/home/ruby/tvm/python/tvm/tir/pipeline.py”, line 122, in _pipeline
mod = tvm.ir.transform.Sequential(passes)(mod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/home/ruby/tvm/python/tvm/ir/transform.py”, line 238, in call
return _ffi_transform_api.RunPass(self, mod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “tvm/_ffi/_cython/./packed_func.pxi”, line 339, in tvm._ffi._cy3.core.PackedFuncBase.call
File “tvm/_ffi/_cython/./packed_func.pxi”, line 270, in tvm._ffi._cy3.core.FuncCall
File “tvm/_ffi/_cython/./packed_func.pxi”, line 259, in tvm._ffi._cy3.core.FuncCall3
File “tvm/_ffi/_cython/./base.pxi”, line 185, in tvm._ffi._cy3.core.CHECK_CALL
File “/home/ruby/tvm/src/tir/analysis/verify_memory.cc”, line 203, in operator()
LOG(FATAL) << “RuntimeError: Memory verification failed with the following errors:\n”
^^^^^^^^^^^^^^^^^^^^^^^^^^^
tvm._ffi.base.TVMError: Traceback (most recent call last):
0: operator()
at /home/ruby/tvm/src/tir/analysis/verify_memory.cc:203
Did you forget to bind?
Variable compute
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable lv50
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
Variable lv60
is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File “/home/ruby/tvm/src/tir/analysis/verify_memory.cc”, line 203
RuntimeError: Memory verification failed with the following errors:
from tvm.script import tir as T
@T.prim_func
def fused_reshape2_add6_relu2(lv60: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”), lv50: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”), compute_intermediate: T.Buffer((T.int64(1), T.int64(128), T.int64(28), T.int64(28)), “float32”)):
T.func_attr({“target”: T.target({“arch”: “sm_53”, “host”: {“keys”: [“arm_cpu”, “cpu”], “kind”: “llvm”, “mtriple”: “aarch64-unknown-linux-gnu”, “tag”: “”}, “keys”: [“cuda”, “gpu”], “kind”: “cuda”, “max_num_threads”: 1024, “tag”: “”, “thread_warp_size”: 32}), “tir.noalias”: T.bool(True)})
T_reshape = T.allocate([100352], “float32”, “global”)
T_reshape_1 = T.Buffer((T.int64(100352),), data=T_reshape)
for ax1, ax2, ax3 in T.grid(128, 28, 28):
cse_var_1: T.int32 = ax1 * 784 + ax2 * 28 + ax3
lv60_1 = T.Buffer((T.int64(100352),), data=lv60.data)
T_reshape_1[cse_var_1] = lv60_1[cse_var_1]
T_reshape_2 = T.Buffer((T.int64(100352),), data=T_reshape)
for ax1, ax2, ax3 in T.grid(128, 28, 28):
cse_var_2: T.int32 = ax1 * 784 + ax2 * 28 + ax3
lv50_1 = T.Buffer((T.int64(100352),), data=lv50.data)
T_reshape_2[cse_var_2] = T_reshape_1[cse_var_2] + lv50_1[cse_var_2]
for i1, i2, i3 in T.grid(128, 28, 28):
cse_var_3: T.int32 = i1 * 784 + i2 * 28 + i3
compute_intermediate_1 = T.Buffer((T.int64(100352),), data=compute_intermediate.data)
compute_intermediate_1[cse_var_3] = T.max(T_reshape_2[cse_var_3], T.float32(0.0))
real 1052m59.636s
user 3834m21.524s
sys 125m26.492s