Where can I get the latest tvm news?

Hi all, I’m not familiar to tvm. 2 years ago, I noticed that tvm has two types of runtime: the graph executor and relay vm. The graph executor has good performance but only support static-shape models. The relay vm supports dynamic models but do not have good performance.

Here are my questions:

  1. Is that still the fact about graph executor and relay vm?
  2. How can I keep track of the evolution of the community? There are a lot of posts, and I can’t find a summary easily.
  3. What features are the developers working on?

Thanks a lot for answering!

Great question, please see Establish TVM Unity Connection — A Technical Strategy on some of the latest updates on TVM Unity. We will also post updated notes on this direction

Hi tianqi, thank you for your reply! I read the post and have found some useful knowledge. Seems Relax is the next generation IR after Relay, and will have better support for dynamic shape. Seems it’s still on the way, and the support (compiler, runtime) for relax is not ready?

But I still can’t find the status of some interested subject after a lot of search in the forum and in the manual. I want to ask some more specific questions:

  1. How can I know the status of relay VM:
  • Support for partially dynamic shape
  • Benchmark on some models, or overhead compared with graph executor
  • Is it still available or encouraged to be used or it’s just an experimental work that has been discarded?

something like that. I want to know a method about how can I get these messages.

Is there any blogs or posts introducing these topics?

Thank you !

1 Like

Forum would be the right place to ask for these questions.

Relax already come with a runtime and compiler, with the model coverage being developed. You can also checkout https://mlc.ai/ to get taste of it.

In terms of relay and its VM backend. Relay is still the encouraged path as of now if you want out of box compilation and VM is still being maintained and supported.

It come with some support for some partially dynamic shape in the form of ? dimension. So we are not able to tell say dynamic dimensions of two operators are the same. It relies on memory-pool which could be slightly worse than static allocator. But we expect on most models it would have similar perf to graph-runtime.

1 Like

Thank you for the detailed answer! I’ll try relax in the courses and expect it ready for production soon.

Still some questions, I want to benchmark tvm on some models like resnet and bert on T4 GPU.

I have tried graph executor and relay vm with resnet-18 and the performance is not good. I don’t know if I got the best performance of tvm.

  • I didn’t try the auto-tuning and just set target as cuda -libs=cudnn.
  • I used fp32 and didn’t try fp16 (I don’t know how to do it).

What’s the right way to get the best performance on GPU, cudnn or auto-tuning?

Is there a benchmark (sheet or runnable demo) that I can compare with to ensure that I have got the best performance?

By the way, how is the support for fp16 and tensor core on tvm? Is there any demo or introduction?

Thank you!

For tensor core example, please see

1 Like

thank you siyuan! I’ll try it.

Error message: The scope tir.Block#0 is not a stage pipeline. Definition of a scope that is a stage pipeline:

  • The region cover property holds for every of its child blocks
  • No write-after-read dependency or opaque dependency,
  • only read-after-write and write-after-write are allowed
  • All the statements in the scope are schedulable statements, i.e. Block and For

I tried using tensor core to tune my resnet-18 model, and it reports this error. Could you help to see what’s wrong? Here is my code:

def compile_onnx(path, use_fp16 = False):
    dev = tvm.device('cuda')
    print(dev)
    import pdb;pdb.set_trace()
    onnx_model = onnx.load(path)
    mod, params = relay.frontend.from_onnx(onnx_model)
    target = tvm.target.Target('nvidia/nvidia-t4')
    def convert_layout(mod):
        seq = tvm.transform.Sequential(
            [relay.transform.ConvertLayout({"nn.conv2d": ["NHWC", "OHWI"]})]
        )
        with tvm.transform.PassContext(opt_level=3):
            mod = seq(mod)
        return mod

    if use_fp16:
        mod = ToMixedPrecision('float16')(mod)

    with tempfile.TemporaryDirectory() as work_dir:
        with ms.Profiler() as profiler:
            converted_mod = convert_layout(mod)
            database = ms.relay_integration.tune_relay(
                mod=converted_mod,
                target=target,
                work_dir=work_dir,
                max_trials_global=3000,
                params=params,
            )
            rt_mod1 = ms.relay_integration.compile_relay(
                database=database,
                mod=converted_mod,
                target=target,
                params=params,
            )
        print(profiler.table())