[Solved]Tune conv2d cuda error

Hi,
When I run the tune_conv2d_cuda.py, there are some errors. I don’t know what the error number mean. Thanks.
Get devices for measurement successfully!
No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536583471.704766) [(‘tile_f’, [32, 2, 1, 8]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [1, 64, 8]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 1500), (‘unroll_explicit’, 1)],None,10377757
No: 2 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536583471.704879) [(‘tile_f’, [16, 1, 2, 16]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [128, 2, 2]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 1500), (‘unroll_explicit’, 1)],None,10106750
No: 3 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536583471.704956) [(‘tile_f’, [16, 1, 1, 32]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [1, 256, 2]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 1)],None,5873085
No: 4 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(InstantiationError(‘Skipped because of invalid gpu kernel’,),), error_no=1, all_cost=0.09117388725280762, timestamp=1536583471.705098) [(‘tile_f’, [2, 8, 8, 4]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [64, 8, 1]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 1500), (‘unroll_explicit’, 1)],None,9497304
No: 5 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536583472.405158) [(‘tile_f’, [2, 2, 4, 32]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [2, 128, 2]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 1500), (‘unroll_explicit’, 0)],None,4126295
Finish loading 35 records

1 Like

hi @WeiGao
Timeout error means that cuda kernel doesn’t finish in time. You can try a longer timeout in measure_option.

During auto-tuning, the tuner will try many invalid configs. So these errors are expected.
If you can see some non-zero GFLOPS, then it is okay.
The definition of error_no is here

There are two kinds of timeout. Build timeout (error_no=6) is mainly due to too much unrolling. Run timeout (typically error_no=4, not 7) is mainly due to bad configurations or too tight timeout setting.

Thanks. I set the timeout to 40. I found the error output is
BUG: PySequence_LengthBUG: PySequence_LengthSystemError: null argument to internal routine
SystemError: null argument to internal routine…

Thanks. I tried again by setting a larger timeout threshold, then the output is
Get devices for measurement successfully!
BUG: PySequence_LengthBUG: PySequence_LengthSystemError: null argument to internal routine
SystemError: null argument to internal routine
No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675283.244055) [(‘tile_f’, [1, 32, 1, 16]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [16, 16, 2]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 512), (‘unroll_explicit’, 1)],None,7407349
No: 2 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675283.244159) [(‘tile_f’, [4, 4, 2, 16]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [16, 32, 1]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 512), (‘unroll_explicit’, 0)],None,2342952
No: 3 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092303) [(‘tile_f’, [2, 1, 32, 8]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [4, 2, 64]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 1500), (‘unroll_explicit’, 0)],None,5002521
No: 4 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092394) [(‘tile_f’, [4, 16, 2, 4]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [32, 16, 1]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 512), (‘unroll_explicit’, 0)],None,2534072
No: 5 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092464) [(‘tile_f’, [4, 2, 32, 2]), (‘tile_y’, [7, 1, 1, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [2, 32, 8]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 512), (‘unroll_explicit’, 1)],None,7278571
No: 6 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092531) [(‘tile_f’, [8, 8, 4, 2]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [4, 16, 8]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 512), (‘unroll_explicit’, 1)],None,7661135
No: 7 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092594) [(‘tile_f’, [1, 2, 1, 256]), (‘tile_y’, [7, 1, 1, 1]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [32, 16, 1]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 0)],None,207897
No: 8 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=10, timestamp=1536675284.092655) [(‘tile_f’, [32, 1, 8, 2]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [4, 128, 1]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 1)],None,544683

It seems that there are some errors.

If the timeout has been increased, the all_cost value should also increase to the new timeout if the TimeoutError remains. Can you post how you are changing the timeout?

I just change the timeout of localBuilder and localRunner to 100. The output is following.

ConfigSpace (len=10454400, space_map=
0 tile_f: Split(policy=all, product=512, num_outputs=4) len=220
1 tile_y: Split(policy=all, product=7, num_outputs=4) len=4
2 tile_x: Split(policy=all, product=7, num_outputs=4) len=4
3 tile_rc: Split(policy=all, product=512, num_outputs=3) len=55
4 tile_ry: Split(policy=all, product=3, num_outputs=3) len=3
5 tile_rx: Split(policy=all, product=3, num_outputs=3) len=3
6 auto_unroll_max_step: OtherOption([0, 2, 4]) len=3
7 unroll_explicit: OtherOption([0, 1]) len=2
)
Get devices for measurement successfully!
BUG: PySequence_LengthSystemError: null argument to internal routine
No: 1 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.270204) [(‘tile_f’, [8, 1, 8, 8]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [256, 1, 2]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 1)],None,9524614
No: 2 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.270858) [(‘tile_f’, [64, 1, 2, 4]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [32, 4, 4]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,5108948
No: 3 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.270941) [(‘tile_f’, [1, 16, 1, 32]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [2, 1, 256]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 0)],None,1541729
No: 4 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271023) [(‘tile_f’, [64, 4, 1, 2]), (‘tile_y’, [7, 1, 1, 1]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [4, 4, 32]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 2), (‘unroll_explicit’, 1)],None,8279097
No: 5 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271102) [(‘tile_f’, [1, 1, 256, 2]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [1, 32, 16]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 1)],None,9045179
No: 6 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271182) [(‘tile_f’, [4, 16, 8, 1]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [1, 256, 2]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 2), (‘unroll_explicit’, 1)],None,8388851
No: 7 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.27126) [(‘tile_f’, [2, 8, 32, 1]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [16, 32, 1]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4859623
No: 8 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271343) [(‘tile_f’, [2, 8, 8, 4]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [64, 2, 4]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 2), (‘unroll_explicit’, 1)],None,7814964
No: 9 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271431) [(‘tile_f’, [2, 1, 1, 256]), (‘tile_y’, [1, 1, 1, 7]), (‘tile_x’, [7, 1, 1, 1]), (‘tile_rc’, [2, 4, 64]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 1)],None,9071916
No: 10 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271511) [(‘tile_f’, [2, 16, 2, 8]), (‘tile_y’, [7, 1, 1, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [4, 32, 4]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4538307
No: 11 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271581) [(‘tile_f’, [8, 2, 1, 32]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [256, 1, 2]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 0)],None,1004706
No: 12 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271655) [(‘tile_f’, [8, 8, 4, 2]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [1, 128, 4]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 0)],None,1254515
No: 13 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271724) [(‘tile_f’, [16, 2, 8, 2]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [2, 2, 128]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 2), (‘unroll_explicit’, 0)],None,3276540
No: 14 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271794) [(‘tile_f’, [1, 8, 1, 64]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [16, 4, 8]), (‘tile_ry’, [3, 1, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4749783
No: 15 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271862) [(‘tile_f’, [1, 256, 1, 2]), (‘tile_y’, [7, 1, 1, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [2, 2, 128]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 2), (‘unroll_explicit’, 0)],None,2694623
No: 16 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.271933) [(‘tile_f’, [2, 2, 32, 4]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [8, 8, 8]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [3, 1, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 1)],None,9013311
No: 17 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.272005) [(‘tile_f’, [8, 4, 8, 2]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [4, 128, 1]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4480381
No: 18 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(InstantiationError(‘Skipped because of invalid gpu kernel’,),), error_no=1, all_cost=0.05345797538757324, timestamp=1536720632.272145) [(‘tile_f’, [4, 8, 16, 1]), (‘tile_y’, [1, 1, 7, 1]), (‘tile_x’, [1, 7, 1, 1]), (‘tile_rc’, [8, 64, 1]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4475277
No: 19 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.272222) [(‘tile_f’, [32, 4, 2, 2]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 1, 7]), (‘tile_rc’, [256, 1, 2]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 1, 3]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4878126
No: 20 GFLOPS: 0.00/0.00 result: MeasureResult(costs=(TimeoutError(),), error_no=6, all_cost=100, timestamp=1536720632.272288) [(‘tile_f’, [16, 2, 8, 2]), (‘tile_y’, [1, 7, 1, 1]), (‘tile_x’, [1, 1, 7, 1]), (‘tile_rc’, [128, 4, 1]), (‘tile_ry’, [1, 1, 3]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 4), (‘unroll_explicit’, 0)],None,4461900
Finish loading 20 records
Cannot find config for target=cuda, workload=(‘conv2d_no_batching’, 1, 7, 7, 512, 512, 3, 3, (1, 1), (1, 1)). A fallback configuration is used, which may bring great performance regression.

Best config:
,None,None
Finish loading 20 records
Time cost of this operator: 0.019453

I think it is a problem related to your python rather than TVM.
Maybe you can try another python version?

Thanks a lot. I change the version of python from 2.7.5 to python 3.6.3 using conda on centos7. It does work.