AoT LLVM backend with static library export with tuning

Hi, I want to use AoT LLVM backend to generate a self-contained static library, with auto scheduler tuning best history applied. Currently it seems I can only use C++ backend with aot, however:

  1. The tuning result seems not loaded when aot is specified:

For python code like this, there is no issue: with auto_scheduler.ApplyHistoryBest(log_file): with tvm.transform.PassContext( opt_level=3, config={“relay.backend.use_auto_scheduler”: True} ): lib = relay.build(mod, target=target, params=params,)

However if I specify runtime=Runtime(‘cpp’),executor=backend.Executor(“aot”, {“interface-api”: “packed”,}), it will give warning that the DAG workload key are not found. I find that the warning work load key are different than the ones in the previous tunning json files.

  1. I can see when export_library, the lib0.o contains a new main function calls something like: tvmgen_modname___tvm_main__ However I can’t find any samples to call the model. I also don’t know the signature for the main function. could you guys help me how can I run an aot inference and test it out?

  2. Can I just compile multiple models into a single binary (multiple lib0.o, lib1.o dev.o) without any additional dynamic libraries? Thanks!

Draw the attention of people who can help you. cc. @areusch

hi @indarkness, could you give a little more info here? Sorry for the lack of AOT tutorials with the C++ runtime–it only supports llvm and c backends right now, so it’s still somewhat experimental for the classic C++ runtime use cases. It sounds like it should work for you, though.

What Python code are you running here and do you mind copying in the terminal output?

In the C++ runtime, create an AOTExecutor: https://github.com/apache/tvm/blob/main/tests/python/relay/aot/test_cpp_aot.py#L143

This should call the generated __main__ function for you.

The intention is that you can do this–we haven’t tested this extensively with the C++ runtime just yet. The name prefix (tvmgen_modname_) you noted above is intended to avoid collisions.

  1. The code is included. The output is “Cannot find tuned schedules for target=… workload key =”, A fallback TOPI schedule is used. This warning will not shown when “executor=backend.Executor(“aot”, {“interface-api”: “packed”})” is removed from the code. The workload key is different than the one in the json generated by tuning when AOT is specified as an executor in relay.build.
Code
from tvm import testing
testing.utils.install_request_hook(depth=3)
import numpy as np
import os
import onnx
import tvm
from tvm import relay, auto_scheduler
from tvm.relay import data_dep_optimization as ddo
import tvm.relay.testing
from tvm.contrib import graph_executor
from tvm.contrib.utils import tempdir
from tvm.relay import backend, testing
from tvm.relay.backend import Executor, Runtime

def get_network(name, batch_size, layout="NHWC", dtype="float32", use_sparse=False):
    """Get the symbol definition and random weight of a network"""
    input_shape = (batch_size, 3, 640, 640)
    model = onnx.load('model.onnx')
    mod, params = relay.frontend.from_onnx(model)
    return mod, params, input_shape

target = tvm.target.Target("llvm -mtriple=aarch64-linux-gnu -mattr=+neon")

device_key = "android"
rpc_host = "127.0.0.1"
rpc_port = 7070
use_ndk = True
network = "testmodel"
use_sparse = False
batch_size = 1
layout = "NHWC"
dtype = "float32"
log_file = "%s-%s-B%d-%s.json" % (network, layout, batch_size, target.kind.name)
print("Get model...")
mod, params, input_shape = get_network(
    network, batch_size, layout, dtype=dtype, use_sparse=use_sparse
)
print("Extract tasks...")
tasks, task_weights = auto_scheduler.extract_tasks(mod["main"], params, target)

for idx, task in enumerate(tasks):
    print("========== Task %d  (workload key: %s) ==========" % (idx, task.workload_key))
    print(task.compute_dag)
def tune_and_evaluate():
    print("Begin tuning...")
    tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
    tune_option = auto_scheduler.TuningOptions(
        num_measure_trials=20000,  # change this to 20000 to achieve the best performance
        builder=auto_scheduler.LocalBuilder(build_func="ndk" if use_ndk else "default"),
        runner=auto_scheduler.RPCRunner(
            device_key,
            host=rpc_host,
            port=rpc_port,
            timeout=30,
            repeat=1,
            min_repeat_ms=200,
            enable_cpu_cache_flush=True,
        ),
        measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
    )

    tuner.tune(tune_option)
    # Compile with the history best
    print("Compile...")
    
    with auto_scheduler.ApplyHistoryBest(log_file):
        with tvm.transform.PassContext(
            opt_level=3, config={"relay.backend.use_auto_scheduler": True}
        ):
            lib = relay.build(mod, target=target, params=params, executor=backend.Executor("aot", {"interface-api": "packed"}),)
tune_and_evaluate()
  1. That’s python calling the AOT through loading a dynamic linked model library. Do you guys have route to call the models statically linked? We might not have luxury to use a separate dynamic linked library and load it in runtime. Can we link the model statically and call the inference function, like the main function? Can we statically link multiple models into a single binary, containing model inference function directly without loading an external model dynamical linked library?

@areusch add cc to draw attention to the topic.