How to tune dense layer for ARM CPU with AutoTVM?

tigertang · February 25, 2023, 10:41am

I would like to deploy transformer models to ARM CPU with AutoTVM. As we all know, transformer models consist of dense and matmul layers. If the sequence length is small, then dense layers will cost much latency. But I found that the dense layers are not optimized for ARM CPU with AutoTVM. The performance will be hurt a lot if dense layers are not optimized. Is there a workaround (except using AutoScheduler)?

You may be confused about why I am not using AutoScheduler. That’s because I add some custom operators in transformer. And my custom operator is ONLY implemented with AutoTVM schedules.

github.com

apache/tvm/blob/main/python/tvm/topi/arm_cpu/dense.py#L24


# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
# pylint: disable=invalid-name, unused-variable, no-else-return, unused-argument, import-outside-toplevel
"""Dense schedule for ARM CPU"""


from tvm import autotvm
from .mprofile.dsp.dense import dense_dsp_schedule, dense_dsp_compute




@autotvm.register_topi_compute("dense_dsp.arm_cpu")
def dense_dsp(cfg, data, weight, bias, out_dtype):
    """Compute conv2d_nhwc with v7e-m DSP instructions."""
    return dense_dsp_compute(cfg, data, weight, bias=bias, out_dtype=out_dtype)




@autotvm.register_topi_schedule("dense_dsp.arm_cpu")
def schedule_dense_dsp(cfg, outs):
    """Create schedule for dense_dsp"""
    return dense_dsp_schedule(cfg, outs)

tigertang · February 25, 2023, 10:44am

Also, I am thinking if it is possible to use AutoScheduler and AutoTVM at the same time. For example, I use AutoTVM to tune some operators but I use AutoScheduler to tune those operators without an implementation.

tigertang · February 26, 2023, 5:01am

I found that the meta schedule satisfies my need to mix auto-scheduler and auto tvm. But there’s no tutorial for it.