Hi, I am new to TVM and I am working with an application which has to perform a lot of transpose operations on tensors with a lot of indices (often >10)
I wanted to see how well the auto-scheduler can optimize these transpose operations, but it doesn’t seem to generate a wide range of schedules to search through:
@auto_scheduler.register_workload
def transpose(shape, axes):
A = te.placeholder( shape, name = 'Placeholder_A' )
out = topi.transpose( A, axes )
return [A,out]
axes = [0,1,2,3,5,4],
shape = [6,8,2,4,2,5]
task = auto_scheduler.SearchTask( func = transpose, args = (shape, axes), target = 'llvm' )
print(task.compute_dag)
log_file = "transpose.json"
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=500,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
verbose=2,
)
task.tune(tune_option)
I have included the tuning output log below. The output log looks like it is only searching through a relatively small search space, and not trying loop reordering at all, where I think it should be generating quite a large search space of schedules.
Does anyone know what I can do to make it tune correctly? Any help would be greatly appreciated.
Placeholder_A = PLACEHOLDER [6, 8, 2, 4, 2, 5]
T_transpose(ax0, ax1, ax2, ax3, ax4, ax5) = Placeholder_A[ax0, ax1, ax2, ax3, ax5, ax4]
----------------------------------------------------------------------
------------------------------ [ Search ]
----------------------------------------------------------------------
Generate Sketches #s: 1
Sample Iter: 5 #Pop: 1 #Target: 50 fail_ct: 10239 Time elapsed: 0.79
#Target has been reduced to 25 due to too many failures or duplications
Sample Iter: 10 #Pop: 1 #Target: 25 fail_ct: 20479 Time elapsed: 1.59
#Target has been reduced to 12 due to too many failures or duplications
Sample Iter: 15 #Pop: 1 #Target: 12 fail_ct: 30719 Time elapsed: 2.37
#Target has been reduced to 6 due to too many failures or duplications
Sample Iter: 20 #Pop: 1 #Target: 6 fail_ct: 40959 Time elapsed: 3.16
#Target has been reduced to 3 due to too many failures or duplications
Sample Iter: 25 #Pop: 1 #Target: 3 fail_ct: 51199 Time elapsed: 3.95
#Target has been reduced to 1 due to too many failures or duplications
Sample Initial Population #s: 1 fail_ct: 53247 Time elapsed: 4.12
GA Iter: 0 Max score: 0.4089 Min score: 0.4089 #Pop: 1 #M+: 0 #M-: 0
GA Iter: 4 Max score: 0.6576 Min score: 0.0138 #Pop: 5 #M+: 63 #M-: 8920
EvolutionarySearch #s: 5 Time elapsed: 0.64
----------------------------------------------------------------------
------------------------------ [ Measure ]
----------------------------------------------------------------------
Get 5 programs to measure:
.....*****
==================================================
No: 1 GFLOPS: 0.00 / 0.00 results: MeasureResult(cost:[0.0000], error_no:0, all_cost:0.33, Tstamp:1637765955.24)
==================================================
Placeholder: Placeholder_A
parallel ax0@ax1@ax2@ax3@ax4@ (0,1920)
vectorize ax5 (0,2)
T_transpose = ...
==================================================
No: 2 GFLOPS: 0.00 / 0.00 results: MeasureResult(cost:[0.0000], error_no:0, all_cost:0.31, Tstamp:1637765955.50)
==================================================
Placeholder: Placeholder_A
parallel ax0@ax1@ax2@ (0,96)
for ax3 (0,4)
for ax4 (0,5)
vectorize ax5 (0,2)
T_transpose = ...
==================================================
No: 3 GFLOPS: 0.00 / 0.00 results: MeasureResult(cost:[0.0000], error_no:0, all_cost:0.27, Tstamp:1637765955.74)
==================================================
Placeholder: Placeholder_A
parallel ax0@ax1@ax2@ax3@ (0,384)
for ax4 (0,5)
vectorize ax5 (0,2)
T_transpose = ...
==================================================
No: 4 GFLOPS: 0.00 / 0.00 results: MeasureResult(cost:[0.0000], error_no:0, all_cost:0.33, Tstamp:1637765955.99)
==================================================
Placeholder: Placeholder_A
parallel ax0@ax1@ (0,48)
for ax2 (0,2)
for ax3 (0,4)
for ax4 (0,5)
vectorize ax5 (0,2)
T_transpose = ...
==================================================
No: 5 GFLOPS: 0.00 / 0.00 results: MeasureResult(cost:[0.0000], error_no:0, all_cost:0.35, Tstamp:1637765956.23)
==================================================
Placeholder: Placeholder_A
parallel ax0@ (0,6)
for ax1 (0,8)
for ax2 (0,2)
for ax3 (0,4)
for ax4 (0,5)
vectorize ax5 (0,2)
T_transpose = ...
Time elapsed for measurement: 2.38 s
----------------------------------------------------------------------
------------------------------ [ Train cost model ]
----------------------------------------------------------------------
Time elapsed for training: 0.10 s
----------------------------------------------------------------------
------------------------------ [ Search ]
----------------------------------------------------------------------
Sample Initial Population #s: 1 fail_ct: 2047 Time elapsed: 0.17
GA Iter: 0 Max score: N/A Min score: N/A #Pop: 0 #M+: 0 #M-: 0
GA Iter: 4 Max score: N/A Min score: N/A #Pop: 0 #M+: 64 #M-: 8913
EvolutionarySearch #s: 0 Time elapsed: 0.58
----------------------------------------------------------------------
------------------------------ [ Search ]
----------------------------------------------------------------------
Sample Initial Population #s: 1 fail_ct: 2047 Time elapsed: 0.17
GA Iter: 0 Max score: N/A Min score: N/A #Pop: 0 #M+: 0 #M-: 0
GA Iter: 4 Max score: N/A Min score: N/A #Pop: 0 #M+: 59 #M-: 8861
EvolutionarySearch #s: 0 Time elapsed: 0.60
It seems all candidates in the search space have been measured.
----------------------------------------------------------------------
------------------------------ [ Done ]
----------------------------------------------------------------------
No valid state found in this search round. Check if it has traversed all of the search space.