I’m still looking at auto scheduling mask rcnn workloads (now with nvptx
targets, since I found that dynamic ops run much faster if compiled with nvptx
). There are two tasks that are much more costly than others (task 34 and 35, see below).
The problem is the performance on these tasks don’t improve after certain trials (say 1000), but the task scheduler is still trying too many trials on them, even though there is no improvement happening for more than 1000 trials.
Is there a way to let the task scheduler “give up” on certain tasks after it sees no improvement after certain trials, and on focus more on other tasks?
| ID | Latency (ms) | Speed (GFLOPS) | Trials |
-------------------------------------------------
| 0 | 0.001 | 0.00 | 64 |
| 1 | 0.001 | 0.00 | 64 |
| 2 | 0.001 | 0.00 | 64 |
| 3 | 0.001 | 0.00 | 64 |
| 4 | 0.001 | 0.00 | 64 |
| 5 | 0.888 | 3413.29 | 384 |
| 6 | 0.246 | 93.53 | 128 |
| 7 | 0.142 | 2347.76 | 128 |
| 8 | 0.606 | 4875.03 | 576 |
| 9 | 0.686 | 1924.58 | 256 |
| 10 | 0.667 | 2010.75 | 640 |
| 11 | 0.490 | 2687.18 | 320 |
| 12 | 0.655 | 4017.95 | 320 |
| 13 | 0.760 | 3882.47 | 320 |
| 14 | 1.480 | 1774.72 | 512 |
| 15 | 0.438 | 3024.36 | 576 |
| 16 | 0.442 | 2968.30 | 448 |
| 17 | 0.501 | 5886.45 | 512 |
| 18 | 0.677 | 3881.25 | 256 |
| 19 | 0.784 | 3764.09 | 320 |
| 20 | 0.893 | 2939.17 | 320 |
| 21 | 0.404 | 3259.75 | 832 |
| 22 | 0.291 | 4506.00 | 576 |
| 23 | 0.563 | 5238.59 | 1088 |
| 24 | 0.628 | 4175.38 | 320 |
| 25 | 0.922 | 3198.74 | 320 |
| 26 | 0.989 | 2651.13 | 384 |
| 27 | 0.394 | 3334.79 | 384 |
| 28 | 0.448 | 2924.52 | 320 |
| 29 | 0.597 | 4936.91 | 448 |
| 30 | 0.271 | 2415.55 | 128 |
| 31 | 0.667 | 1966.57 | 256 |
| 32 | 0.980 | 2678.81 | 320 |
| 33 | 1.528 | 3443.95 | 576 |
| 34 | 8.396 | 5621.50 | 2624 |
| 35 | 7.041 | 6704.06 | 2240 |
| 36 | 0.211 | 1166.61 | 128 |
| 37 | 2.262 | 5216.62 | 768 |
| 38 | 2.292 | 5149.95 | 768 |
| 39 | 0.056 | 1108.73 | 64 |
| 40 | 0.646 | 4564.29 | 320 |
| 41 | 0.024 | 654.22 | 64 |
| 42 | 0.196 | 3763.42 | 128 |
| 43 | 0.243 | 3039.72 | 128 |
| 44 | 0.009 | 417.38 | 64 |
| 45 | 0.002 | 20.88 | 64 |
| 46 | 0.174 | 1145.47 | 64 |
| 47 | 0.005 | 200.02 | 64 |
| 48 | 0.199 | 309.35 | 64 |
| 49 | 0.059 | 261.60 | 64 |
| 50 | 0.019 | 205.76 | 64 |
| 51 | 0.005 | 197.48 | 64 |
| 52 | 0.005 | 48.46 | 64 |
| 53 | 0.001 | 0.00 | 64 |
-------------------------------------------------
Estimated total latency: 55.204 ms Trials: 15553 Used time : 36529 s Next ID: 34