How to measure the time cost when inferencing using TVM?

hi,I am new to TVM and I want to ask a question :

when using realy.build a lib and we can use time_evaluator to calculate the time cost; how to calculate the time when using create_executor?

I think

start = time.time()
tvm_output = intrp.evaluate()(tvm.nd.array(x.astype(dtype)), **params).asnumpy()
end = time.time()
print("Execute over! used time is : {} ".format(end-start))

is not a good method.

and I have done some test for this :

net, params = relay.frontend.from_onnx(onnx_model, shape_dict)
    with tvm.transform.PassContext(opt_level=4):
        lib = relay.build(net, target, params=params)
    ctx = tvm.context(target, 0)
    module = runtime.GraphModule(lib["default"](ctx))
    module.set_input("input.1", x)
    module.set_input(**params)
    ftimer = module.module.time_evaluator("run", ctx, number=1, repeat=10)
    prof_res = np.array(ftimer().results) * 1e3  # convert to millisecond
    start = time.time()
    module.run()     # I also measured module.run() time and the time cost diff with np.mean(prof_res) is very little in llvm cpu backend and diff is very large in cuda gpu backend;
    end = time.time()
    print(model_name,use_openmp,target,"%.2f ms (std_dev %.2f ms) (direct-run %.2f ms)" % (np.mean(prof_res), np.std(prof_res), end*1000-start*1000))
    # print("%.2f ms (std_dev %.2f ms) (direct-run %.2f ms)" % (np.mean(prof_res), np.std(prof_res), end*1000-start*1000))
    # print("model.run() : {} ms".format(end*1000-start*1000))
    with tvm.transform.PassContext(opt_level=4):
        intrp = relay.build_module.create_executor("graph", net, tvm.cpu(0), target)
    dtype = "float32"
    start = time.time()
    tvm_output = intrp.evaluate()(tvm.nd.array(x.astype(dtype)), **params).asnumpy()
    end = time.time()
    print("Execute over! used time is : {} ".format(end*1000-start*1000))

And finally I get 3 time data ;

model.onnx none llvm 183.05 ms (direct- module.run() 188.18 ms) Execute over! used time is : 45428.89794921875 ms

The gap between create_executor and GraphModule is so large but why ~? Thanks very much!

1 Like

Hi @wang-y-z. Currently there is not a good way to get timing results using the executor api. Instead I recommend using time_evaluator with the GraphModule directly: tvm.runtime — tvm 0.8.dev0 documentation. This is the approach you do in the top of your second code block. The reason you are probably seeing differences with your other timing methods is twofold. 1. TVM does lazy initialization, so the first run will be slow. 2. code running on the gpu is asynchronous.

1 Like

thank you so much for your precise description~ :grinning: