Auto-scheduler get stuck in an infinite loop when the task has more than 7 compute stages

SUSTechHong · April 14, 2022, 3:28am

Dear all,

When I tuned a tough task using Auto-scheduler, it got stuck in an infinite loop after 128 steps.

Then I tried to remove the compute and found that when the task has more than 7 compute stages, Auto-scheduler will get stuck in an infinite loop after 128 steps.

Here is a simple task that can reproduce the problem.

@auto_scheduler.register_workload
def test():

    A = te.placeholder((16, 16, 16, 16), name="A")
    B = te.placeholder((16, 16, 16, 16), name="B")

    k1 = te.reduce_axis((0, 16), "k1")
    M1 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            A[h, w, m, k1] * B[h, w, k1, n], axis=k1),
        name="M1",
    )

    k2 = te.reduce_axis((0, 16), "k2")
    M2 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M1[h, w, m, k2] * B[h, w, k2, n], axis=k2),
        name="M2",
    )

    k3 = te.reduce_axis((0, 16), "k3")
    M3 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M2[h, w, m, k3] * B[h, w, k3, n], axis=k3),
        name="M3",
    )

    k4 = te.reduce_axis((0, 16), "k4")
    M4 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M3[h, w, m, k4] * B[h, w, k4, n], axis=k4),
        name="M4",
    )

    k5 = te.reduce_axis((0, 16), "k5")
    M5 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M4[h, w, m, k5] * B[h, w, k5, n], axis=k5),
        name="M5",
    )

    k6 = te.reduce_axis((0, 16), "k6")
    M6 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M5[h, w, m, k6] * B[h, w, k6, n], axis=k6),
        name="M6",
    )

    k7 = te.reduce_axis((0, 16), "k7")
    M7 = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M6[h, w, m, k7] * B[h, w, k7, n], axis=k7),
        name="M7",
    )

    k8 = te.reduce_axis((0, 16), "k8")
    O = te.compute(
        (16, 16, 16, 16),
        lambda h, w, m, n: te.sum(
            M7[h, w, m, k8] * B[h, w, k8, n], axis=k8),
        name="O",
    )

    return [A, B, O]

The scores predicted by the cost model may be -inf. But I don’t know the reason. How can I tune a tough task using Auto-scheduler?

Thank you in advance.

merrymercy · June 12, 2022, 10:51pm

For a large computational graph, it is better to write it in Relay and use the relay-integration API to call auto-scheduler. Relay can partition the large graphs and make them more friendly for auto-scheduler