I compiled Resnet50 with opt_level = 0, 1, 2, 3, but the time costs have little difference. So what’s the problem? Can anyone help me?
Probably you may need to try profiling each operation on tvm graph runtime.
Ref. https://github.com/dmlc/tvm/blob/5b5465b59e1d5f943a46db84d43076a93d8a7003/src/runtime/graph/graph_runtime.cc#L56
Does TVM support any Python API to profile each operations on graph runtime?
Which device are you using?
A personal PC with GTX 1060. Code is something like this:
with nnvm.compiler.build_config(opt_level=3):
graph, lib, params = nnvm.compiler.build(
sym, target, shape_dict, params=params, dtype=dtype)
where target = 'cuda'
Ref. https://github.com/dmlc/tvm/pull/1378/
TVM debugger is on the way to get some profiling information.