hi,I am new to TVM and I want to ask a question :
when using realy.build a lib and we can use time_evaluator to calculate the time cost; how to calculate the time when using create_executor?
I think
start = time.time()
tvm_output = intrp.evaluate()(tvm.nd.array(x.astype(dtype)), **params).asnumpy()
end = time.time()
print("Execute over! used time is : {} ".format(end-start))
is not a good method.
and I have done some test for this :
net, params = relay.frontend.from_onnx(onnx_model, shape_dict)
with tvm.transform.PassContext(opt_level=4):
lib = relay.build(net, target, params=params)
ctx = tvm.context(target, 0)
module = runtime.GraphModule(lib["default"](ctx))
module.set_input("input.1", x)
module.set_input(**params)
ftimer = module.module.time_evaluator("run", ctx, number=1, repeat=10)
prof_res = np.array(ftimer().results) * 1e3 # convert to millisecond
start = time.time()
module.run() # I also measured module.run() time and the time cost diff with np.mean(prof_res) is very little in llvm cpu backend and diff is very large in cuda gpu backend;
end = time.time()
print(model_name,use_openmp,target,"%.2f ms (std_dev %.2f ms) (direct-run %.2f ms)" % (np.mean(prof_res), np.std(prof_res), end*1000-start*1000))
# print("%.2f ms (std_dev %.2f ms) (direct-run %.2f ms)" % (np.mean(prof_res), np.std(prof_res), end*1000-start*1000))
# print("model.run() : {} ms".format(end*1000-start*1000))
with tvm.transform.PassContext(opt_level=4):
intrp = relay.build_module.create_executor("graph", net, tvm.cpu(0), target)
dtype = "float32"
start = time.time()
tvm_output = intrp.evaluate()(tvm.nd.array(x.astype(dtype)), **params).asnumpy()
end = time.time()
print("Execute over! used time is : {} ".format(end*1000-start*1000))
And finally I get 3 time data ;
model.onnx none llvm 183.05 ms (direct- module.run() 188.18 ms) Execute over! used time is : 45428.89794921875 ms
The gap between create_executor and GraphModule is so large but why ~? Thanks very much!