Where can I get the latest tvm news?

freshbird2023 · January 4, 2023, 7:03am

Hi all, I’m not familiar to tvm. 2 years ago, I noticed that tvm has two types of runtime: the graph executor and relay vm. The graph executor has good performance but only support static-shape models. The relay vm supports dynamic models but do not have good performance.

Here are my questions:

Is that still the fact about graph executor and relay vm?
How can I keep track of the evolution of the community? There are a lot of posts, and I can’t find a summary easily.
What features are the developers working on?

Thanks a lot for answering!

tqchen · January 4, 2023, 2:25pm

Great question, please see Establish TVM Unity Connection — A Technical Strategy on some of the latest updates on TVM Unity. We will also post updated notes on this direction

freshbird2023 · January 5, 2023, 9:06am

Hi tianqi, thank you for your reply! I read the post and have found some useful knowledge. Seems Relax is the next generation IR after Relay, and will have better support for dynamic shape. Seems it’s still on the way, and the support (compiler, runtime) for relax is not ready?

But I still can’t find the status of some interested subject after a lot of search in the forum and in the manual. I want to ask some more specific questions:

How can I know the status of relay VM:

Support for partially dynamic shape
Benchmark on some models, or overhead compared with graph executor
Is it still available or encouraged to be used or it’s just an experimental work that has been discarded?

something like that. I want to know a method about how can I get these messages.

Is there any blogs or posts introducing these topics?

Thank you !

tqchen · January 5, 2023, 1:07pm

Forum would be the right place to ask for these questions.

Relax already come with a runtime and compiler, with the model coverage being developed. You can also checkout https://mlc.ai/ to get taste of it.

In terms of relay and its VM backend. Relay is still the encouraged path as of now if you want out of box compilation and VM is still being maintained and supported.

It come with some support for some partially dynamic shape in the form of ? dimension. So we are not able to tell say dynamic dimensions of two operators are the same. It relies on memory-pool which could be slightly worse than static allocator. But we expect on most models it would have similar perf to graph-runtime.

freshbird2023 · January 5, 2023, 7:06pm

Thank you for the detailed answer! I’ll try relax in the courses and expect it ready for production soon.

Still some questions, I want to benchmark tvm on some models like resnet and bert on T4 GPU.

I have tried graph executor and relay vm with resnet-18 and the performance is not good. I don’t know if I got the best performance of tvm.

I didn’t try the auto-tuning and just set target as cuda -libs=cudnn.
I used fp32 and didn’t try fp16 (I don’t know how to do it).

What’s the right way to get the best performance on GPU, cudnn or auto-tuning?

Is there a benchmark (sheet or runnable demo) that I can compare with to ensure that I have got the best performance?

By the way, how is the support for fp16 and tensor core on tvm? Is there any demo or introduction?

Thank you!

Hzfengsy · January 6, 2023, 6:15am

For tensor core example, please see

github.com

apache/tvm/blob/main/tests/python/integration/test_auto_tensorize.py

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
"""Integration test for MetaSchedule's auto tensorization."""
import tempfile

import numpy as np

This file has been truncated. show original

freshbird2023 · January 9, 2023, 3:39am

thank you siyuan! I’ll try it.

freshbird2023 · January 11, 2023, 9:37am

Error message: The scope tir.Block#0 is not a stage pipeline. Definition of a scope that is a stage pipeline:

The region cover property holds for every of its child blocks

No write-after-read dependency or opaque dependency,

only read-after-write and write-after-write are allowed

All the statements in the scope are schedulable statements, i.e. Block and For

I tried using tensor core to tune my resnet-18 model, and it reports this error. Could you help to see what’s wrong? Here is my code:

def compile_onnx(path, use_fp16 = False):
    dev = tvm.device('cuda')
    print(dev)
    import pdb;pdb.set_trace()
    onnx_model = onnx.load(path)
    mod, params = relay.frontend.from_onnx(onnx_model)
    target = tvm.target.Target('nvidia/nvidia-t4')
    def convert_layout(mod):
        seq = tvm.transform.Sequential(
            [relay.transform.ConvertLayout({"nn.conv2d": ["NHWC", "OHWI"]})]
        )
        with tvm.transform.PassContext(opt_level=3):
            mod = seq(mod)
        return mod

    if use_fp16:
        mod = ToMixedPrecision('float16')(mod)

    with tempfile.TemporaryDirectory() as work_dir:
        with ms.Profiler() as profiler:
            converted_mod = convert_layout(mod)
            database = ms.relay_integration.tune_relay(
                mod=converted_mod,
                target=target,
                work_dir=work_dir,
                max_trials_global=3000,
                params=params,
            )
            rt_mod1 = ms.relay_integration.compile_relay(
                database=database,
                mod=converted_mod,
                target=target,
                params=params,
            )
        print(profiler.table())