To enable C++ users integrate tvm runtime into their programs and run e2e inference, we need a C++ GraphModule similar to that in python ( graph_runtime.py ) and Java ( GraphRuntime.java, GraphModule.java )
Since we already have runtime/graph/graph_runtime.cc, and exposed to frontend by PackedFunc API, shall we simply reuse the class GraphRuntime but move the definition to graph_runtime.h ?
The only problem is current GraphRuntime::SetInput, GetInput, GetOutput take DLTensor as args, may need to support NDArray for friendly user-experience.
DLTensor is used for compatibility reason with RPC. We can actually expose a version of GraphRuntime in the tvm/runtime/ with proper implementation hiding. We can, however, add an overloaded interface that takes NDArray.