I was trying to using Ansor to do the optimization for the Resnet model. However, I observed that for different targets (e.g., CPU and GPU), although the total number of the extracted subgraphs/workloads are the same, the specific subgraphs/workloads are different.
For instance: The workload for GPU:
========== Task 7 (workload key: ["6f0503383aee3dbb94006cc087e0349a"]) ========== placeholder = PLACEHOLDER [1, 256, 14, 14] pad_temp(i0, i1, i2, i3) = tir.if_then_else(((((i2 >= 1) && (i2 < 15)) && (i3 >= 1)) && (i3 < 15)), placeholder[i0, i1, (i2 - 1), (i3 - 1)], 0f) placeholder = PLACEHOLDER [256, 256, 3, 3] compute(nn, ff, yy, xx) += (pad_temp[nn, rc, (yy + ry), (xx + rx)]*placeholder[ff, rc, ry, rx]) placeholder = PLACEHOLDER [1, 256, 14, 14] T_add(ax0, ax1, ax2, ax3) = (compute[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, ax2, ax3]) placeholder = PLACEHOLDER [1, 256, 1, 1] T_add(ax0, ax1, ax2, ax3) = (T_add[ax0, ax1, ax2, ax3] + placeholder[ax0, ax1, 0, 0]) T_relu(ax0, ax1, ax2, ax3) = max(T_add[ax0, ax1, ax2, ax3], 0f)
The workload for CPU:
========== Task 7 (workload key: ["629dcd4733ec6363e001d9ddb446bb31"]) ========== placeholder = PLACEHOLDER [1, 2, 14, 14, 128] data_pad(i0, i1, i2, i3, i4) = tir.if_then_else(((((i2 >= 1) && (i2 < 15)) && (i3 >= 1)) && (i3 < 15)), placeholder[i0, i1, (i2 - 1), (i3 - 1), i4], 0f) placeholder = PLACEHOLDER [16, 2, 3, 3, 128, 16] conv2d_NCHWc(n, oc_chunk, oh, ow, oc_block) += (data_pad[n, floordiv(ic, 128), (oh + kh), (ow + kw), floormod(ic, 128)]*placeholder[oc_chunk, floordiv(ic, 128), kh, kw, floormod(ic, 128), oc_block]) placeholder = PLACEHOLDER [1, 16, 14, 14, 16] T_add(ax0, ax1, ax2, ax3, ax4) = (conv2d_NCHWc[ax0, ax1, ax2, ax3, ax4] + placeholder[ax0, ax1, ax2, ax3, ax4]) placeholder = PLACEHOLDER [1, 16, 1, 1, 16] T_add(ax0, ax1, ax2, ax3, ax4) = (T_add[ax0, ax1, ax2, ax3, ax4] + placeholder[ax0, ax1, 0, 0, ax4]) T_relu(ax0, ax1, ax2, ax3, ax4) = max(T_add[ax0, ax1, ax2, ax3, ax4], 0f)
Thus, I got a few questions about this?
- What are the reasons behind this phenomenon?
- Is there a way to extract the same subgraph/workload for the DNN model while different targets (e.g., CPU and GPU)?
- I am not very familiar with the representation of the workloads, so I am wondering are the two different workloads still represent the same thing (e.g., same sub-graph of a DNN)? or is there a way to measure the similarity/difference between the two workloads?
Any comments or insight?
Thanks!