How to get best performance for operator net out of TVM

richard-wwu · September 2, 2022, 8:49am

I am trying to execute a net of operators using TVM. For best performance, I tried using AutoScheduler as described here, but even after 24h, the tuning did not find any valid schedules.

Can you suggest what the best alternative would be to get the highest performance out of TVM? Would using the schedules provided in TOPI result in the best performing TVM code?

sunggg · September 3, 2022, 3:35pm

Hi, @richard-wwu. Have you looked into each candidate during the tuning? Since you couldn’t find any valid schedule even after long enough tuning time, I suspect each candidate might be facing some runtime error or compilation errors. That would help to figure out the right next step.

richard-wwu · September 5, 2022, 10:30am

Many thanks for your reply. This is the output of TVM:

----------------------------------------------------------------------
------------------------------  [ Search ]
----------------------------------------------------------------------
Generate Sketches               #s: 1
Sample Iter: 5  #Pop: 0 #Target: 50     fail_ct: 10240  Time elapsed: 22.54
#Target has been reduced to 25 due to too many failures or duplications
Sample Iter: 10 #Pop: 0 #Target: 25     fail_ct: 20480  Time elapsed: 42.35
#Target has been reduced to 12 due to too many failures or duplications
Sample Iter: 15 #Pop: 0 #Target: 12     fail_ct: 30720  Time elapsed: 62.17
#Target has been reduced to 6 due to too many failures or duplications
Sample Iter: 20 #Pop: 0 #Target: 6      fail_ct: 40960  Time elapsed: 81.98
#Target has been reduced to 3 due to too many failures or duplications
Sample Iter: 25 #Pop: 0 #Target: 3      fail_ct: 51200  Time elapsed: 101.80
#Target has been reduced to 1 due to too many failures or duplications
Sample Iter: 30 #Pop: 0 #Target: 1      fail_ct: 61440  Time elapsed: 121.61
Sample Iter: 35 #Pop: 0 #Target: 1      fail_ct: 71680  Time elapsed: 141.42
Sample Iter: 40 #Pop: 0 #Target: 1      fail_ct: 81920  Time elapsed: 161.23
Sample Iter: 45 #Pop: 0 #Target: 1      fail_ct: 92160  Time elapsed: 181.06
Sample Iter: 50 #Pop: 0 #Target: 1      fail_ct: 102400 Time elapsed: 200.87
Sample Iter: 55 #Pop: 0 #Target: 1      fail_ct: 112640 Time elapsed: 220.66
Sample Iter: 60 #Pop: 0 #Target: 1      fail_ct: 122880 Time elapsed: 240.46
Sample Iter: 65 #Pop: 0 #Target: 1      fail_ct: 133120 Time elapsed: 260.26
Sample Iter: 70 #Pop: 0 #Target: 1      fail_ct: 143360 Time elapsed: 280.07
Sample Iter: 75 #Pop: 0 #Target: 1      fail_ct: 153600 Time elapsed: 299.87
Sample Iter: 80 #Pop: 0 #Target: 1      fail_ct: 163840 Time elapsed: 319.67
Sample Iter: 85 #Pop: 0 #Target: 1      fail_ct: 174080 Time elapsed: 339.48
Sample Iter: 90 #Pop: 0 #Target: 1      fail_ct: 184320 Time elapsed: 359.29
Sample Iter: 95 #Pop: 0 #Target: 1      fail_ct: 194560 Time elapsed: 379.10
Sample Iter: 100        #Pop: 0 #Target: 1      fail_ct: 204800 Time elapsed: 398.90

...

Sample Iter: 3345       #Pop: 0 #Target: 1      fail_ct: 6850560        Time elapsed: 13263.46
Sample Iter: 3350       #Pop: 0 #Target: 1      fail_ct: 6860800        Time elapsed: 13283.27
Sample Iter: 3355       #Pop: 0 #Target: 1      fail_ct: 6871040        Time elapsed: 13303.08

As far as I understand it, TVM does not manage to find a candidate, no? It does not seem to get to the step where it compiles and runs candidates.