[MetaSchedule] Usage of MS module

Hi, I’ve used Ansor before. If I understand correctly, MS should be a more powerful version of Ansor. I’ve run an MS-tunning process on an Arm Device for 3 days and collected up to 1GB+ tunning records but it ends up strangly not getting any improvements in latency. Below are the script:

from tvm import meta_schedule as ms
from tvm import relay,rpc
from tvm.contrib import graph_executor 
database = ms.Database.create(work_dir="a dir with tunning record and workload json")
lib = ms.relay_integration.compile_relay(database, mod, target, params)
lib.export_library(os.path.join(model_path, "model.tar"))
tracker = rpc.connect_tracker("127.0.0.1",port=9190)
remote = tracker.request("xxx")
dev = remote.cpu()
remote.upload(os.path.join(model_path, "model.tar"))
f=remote.load_module("model.tar")
m = graph_executor.GraphModule(f["default"](dev))
print(m.benchmark(dev))

I think the JSON database should be loaded correctly since the script is taking 10G+ RAM when compiling the model. Is this expected or did I miss anything?

Thanks for your interest in Meta-schedule.

Your understanding is partially right. MS is the next generation auto-tuning framework than Ansor, which brings features including but not limited to Auto-Tensorization, and the unified infra for template-based (AutoTVM-style) and template-free tuning (Ansor-style).

However, it does not mean better performance on every target. We generate that default MS rules cover the search space of Ansor, i.e. MS should not be slower than Ansor. On the other side, we do not have performance in most “scalar backends” (e.g. llvm w/o tensorization, cuda Cores on GPU)

Thx for your explanation. But previously when I mentioned “not getting any improvements using MS”, I was comparing MS-tuned one with the default topi schedule (around ~120ms). The same relay model can be optimized to ~90ms by Ansor. That’s why I am confused if MS is based on Ansor.

MS does be based on Ansor and should have a similar performance as Ansor. Would be great if you can compare the performance of each op and see the difference