Questions about TVM executors and its APIs

ganler · June 20, 2021, 5:55pm

Hi, I am a new user of TVM. After going through TVM’s tutorials, I found that there are many ways to execute a compiled module.

Style 1: relay.(build_module).build + tvm.contrib.graph_executor example

with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target=target, params=params)
m = graph_executor.GraphModule(lib["default"](dev))
m.set_input(input_name, tvm.nd.array(img.astype(dtype)))
m.run()
tvm_output = m.get_output(0)

Here we only use the graph executor. (but in build's source code we also have an AOT option?)

Style 2: relay.build_module.create_executor(...).evaluate().(input, ...) example

with tvm.transform.PassContext(opt_level=1):
    intrp = relay.build_module.create_executor("graph", mod, tvm.cpu(0), target)
tvm_output = intrp.evaluate()(tvm.nd.array(x.astype(dtype)), **params).numpy()

Here we can use ‘graph’, ‘vm’, and ‘debug’ as the executor.

I am curious about:

will the build process intrigue optimization? or what is the relationship between executors and the optimization passes? (b.c. in codes inside build_module.build will call AOTExecutorFactoryModule or GraphExecutorFactoryModule).
what is the difference between tvm.contrib.graph_executor.GraphExecutor and build_module.GraphExecutor (which is inherited from _interpreter.Executor saying that it was for debugging purpose?)

yuchenj · June 21, 2021, 4:48am

Hi, @ganler! For your first question, I believe the build process does trigger optimization, and which optimization passes that are applied are controlled by the opt_level argument passed to PassContext (passes with opt_level=2 will be applied by default). The reason is if you track the relay.build function, eventually this line in function BuildRelay is executed, and it will apply a set of passes. You can find the design of the pass infrastructure here.

I also noticed these two styles you mentioned, and they have different behaviors when specifying the same level of optimization passes (at least the bytecode generated by these two when using the relay VM executor are different), which makes me very curious about why it happened. I will let you know when I find out the reason and understand the difference between the two.

ganler · June 21, 2021, 4:32pm

Hi @yuchenj, thank you for your kind reply! I want to confirm some follow-up questions and report some issues.

It seems in your mentioned line (Optimize function, build_module.cc:303), a set of (default I think) passes are given. Is that the case: in BuildRelay we will collect some default passes (those in Optimize) into a Sequence, and apply them if pass-level opt_level >= context-level opt_level (filtered in SequentialNode::operator()).

I asked about these questions b.c. I think some tutorial did not show good practice about APIs.

When using relay.build_module.create_executor API, it seems the compilation (BuildRelay) is done in evaluate(). (evaluate() calls interpreter.Executor._make_executor, GraphExecutor._make_executor calls build, and Interpreter._make_executor call some extra optimization passes.)

This means we should call evaluate() inside PassContext otherwise some default passes will be filtered.

import tvm
import tvm.relay as relay
from tvm.relay import testing

def example():
    data = relay.var("data", relay.TensorType((1, 3, 512, 512), "float32"))
    weight = relay.var("weight")
    bn_gamma = relay.var("bn_gamma")
    bn_beta = relay.var("bn_beta")
    bn_mmean = relay.var("bn_mean")
    bn_mvar = relay.var("bn_var")

    simple_net = relay.nn.conv2d(
        data=data, weight=weight, kernel_size=(5, 5), channels=32, padding=(1, 1)
    )
    simple_net = relay.nn.batch_norm(simple_net, bn_gamma, bn_beta, bn_mmean, bn_mvar)[0]
    simple_net = relay.nn.relu(simple_net)
    simple_net = relay.Function(relay.analysis.free_vars(simple_net), simple_net)

    return testing.create_workload(simple_net)

if __name__ == '__main__':
    mod, params = example()
    target = tvm.target.Target('llvm')
    dev = tvm.cpu()
    with tvm.transform.PassContext(opt_level=4): 
        executor = relay.build_module.create_executor("graph", mod, dev, target)

    # Here `evaluate()` is called outside `PassContext` like the following tutorial did:
    # https://tvm.apache.org/docs/tutorials/frontend/from_onnx.html#compile-the-model-with-relay
    tvm_out = executor.evaluate()(
        tvm.nd.empty(shape=(1, 3, 512, 512), 
        device=dev, 
        dtype='float32'), **params)

Doing so, the opt_level=4 passes will not be applied. But if we do:

    with tvm.transform.PassContext(opt_level=4): 
-        executor = relay.build_module.create_executor("graph", mod, dev, target)
+        executor = relay.build_module.create_executor("graph", mod, dev, target).evaluate()

-    tvm_out = executor.evaluate()(
+    tvm_out = executor(

We finally see some logging like “… tvm/src/relay/ir/transform.cc:133: Executing function pass : CombineParallelConv2d with opt level: 4”.

Such tutorials are:

I hope my report might help enhance the tutorial.

ganler · June 21, 2021, 12:53pm

Another thing is that I am still confused about the executors… There are “graph”, “debug” (interpreter), “vm”, “aot” …

Some executor cannot support dynamic shapes (i.e., graph). The default optimization passes of them are not the same. I think any instructions on which one should users pick will be very helpful to improve the flexibility of TVM

yuchenj · June 21, 2021, 7:56pm

Interesting findings! I think we should unify these two sets of APIs because they show different behaviors and can confuse the users a lot. I did some measurements on the VM executor recently and found that the code generated by the second style(relay.build_module.create_executor) is much worse than the first one. I will dig it deeper and maybe we can write an RFC.

mikepapadim · July 7, 2021, 4:54pm

@ganler following what you mention on multiple APIs [Relay] Modify create_executor to pass params by mikepapadim · Pull Request #8418 · apache/tvm · GitHub this PR might provide more insights.