[DISCUSS] Module based Model Runtime Interface

i agree that we should create a new file codegen_blob.cc

Sorry that chime in this a bit late.

I personally prefer the API 0 or 2 in the proposal. The module wrapper design seems a bit too complicated.

I have one question though. Will we pack multiple models in one exported library? Even we pack multiple models, will they use different runtime module? If not likely, should we further simplify the runtime create API to

gmod = lib["runtime_create"](tvm.cpu(0))

I will try to answer it from my view. cc @tqchen

Yes, will pack multiple models in one shared library.

In the shared library, we donā€™t restrict their graph runtime of different model, so different model could have different runtime.

Your proposed api is the same as my implementation without multiple model support, but when we have multiple model support, we will need model key.

Would be great if we can followup on this thread to converge on a library proposal :slight_smile:

cc @FrozenGene @haichen @jwfromm @jonso

D1: I agree with @haichen that APIs 0 and 2 are the cleanest.

D2: Iā€™m in favor of keeping the set/run/get interface as it makes handling multiple inputs and outputs very straightforward. Set/run/get is easy enough to use and understand that I donā€™t think adding a predict method is worth it especially if it complicates I/O due to tuples.

D3: Ideally for a heterogenous runtime we could use the same API but provide a list of contexts, i.e. gmod = lib["resnet18"]([tvm.cpu(0), tvm.gpu(0)]).

D4: I definitely think this is an improvement to the packaging, thanks for keeping this discussion alive @tqchen!

OK, to summarize. here is a strawman:

  • Use API 2, possibly allow list of context in the future
    • gmod = lib["resnet18"](tvm.cpu(0))
    • gmod = lib["resnet18"](tvm.cpu(0), tvm.gpu(0))
  • Keep set/run/get interface to mimize the runtime dep on the data structure

If everyone is OK with the proposal, I propose we proceed with the implementation

Apart from the low level API convention, it would be good to discuss if we would like to provide a thin wrapper on top as follows

Choice of Wrapper API

W0: no wrapping at all, directly use the raw API

W1: directly wrap the module

gmod = tvm.graph_runtime.create(lib["resnet18"](tvm.cpu(0)))

W2: pass the context argument seperately

gmod = tvm.graph_runtime.create(lib["resnet18"], tvm.cpu(0))
1 Like

Sorry for later response. I want to discuss about the API choice.

For API2, how do we keep compatibility with current mechanismļ¼ŸIf we choose to API0, we could do like this:

gmod = lib["runtime_create"]([tvm.cpu(0), tvm.gpu(0)], model_key="default"). That is to say, we could leave one default key in runtime_create function. I think it is better than API0 IMO (rather than we need do lib["default"]([tvm.cpu(0), tvm.gpu(0)])). About the context, I would like we pass tuple / list to function rather than unpacked args too like the example I use here.

For wrapper, I propose we could do like this rather than continue to use tvm.graph_runtime.create

  • We create one wrapper of GraphRuntimeFactory in python interface. Could like this:
class GraphRuntimeFactoryModule(Module):
    """Graph runtime factory module.

    This is a module of graph runtime factory

    Parameters
    ----------
    module : Module
        The interal tvm module that holds the actual graph functions.

    Attributes
    ----------
    module : Module
        The interal tvm module that holds the actual graph functions.
    """

    def __init__(self, module):
        self.module = module
        self._runtime_create = module["runtime_create"]
        super(GraphRuntimeFactoryModule, self).__init__(self.module.handle)

    def __del__(self):
        pass

    def runtime_create(self, ctx):
        """Create the runtime using ctx

        Parameters
        ----------
        ctx : TVMContext or list of TVMContext
        """
        ctx, num_rpc_ctx, device_type_id = get_device_ctx(self.module, ctx)
        return self._runtime_create(*device_type_id)

    def __getitem__(self, key):
        """Get internal module function

        Parameters
        ----------
        key : str
            The key to the module.
        """
        return self.module[key] 

The usage is:

# use_new_graph_mechanism is one flag to indicate we are using new mechanisum
_, complied_graph_lib, _ = relay.build_module.build(mod, "llvm", params=params, use_new_graph_ mechanism =True)
assert(isinstance(complied_graph_lib, GraphRuntimeFactoryModule))
runtime_module = complied_graph_lib.runtime_create(ctx)
assert(isinstance(runtime_module, Module)
gmod = tvm.graph_runtime.GraphModule(runtime_module)

This way could make us keep the old way of using tvm.graph_runtime. That is to say, we donā€™t break any api usage of old tvm.graph_runtime.create. If they like to export json / params / lib and they could use it too.

Of course, we could do a little more inside graph_lib.runtime_create(ctx) and return graph_runtime_module / vm_module / other runtime module directly without tvm.graph_runtime.GraphModule.

It might be bit too much to also create an wrapper for the factory function.

In particular while it is OK to return the right factory module when calling relay.build, the module that is loaded by tvm.runtime.load("xyz.so") will still be the raw runtime.Module unless we do some automatic type lookup.

So we want to be able to support both the following ways of getting the library using a consistent API.

  • C0 lib = tvm.runtime.load("xyz.so")
  • C1 lib = relay.build(...)

While we can naturally return a GraphFactoryModule wrapper in C1, it can be complicated to do so in C0.

If the particular concern is about API backward compatibility, we can find alternative ways based on API2, by supporting the following API:

gmod = tvm.graph_runtime.create(lib, tvm.cpu(0))

Inside the wrapper creation, the wrapper looks queries the lib have a ā€œdefault_modelā€ function, and call lib["default_model"] to get the corresponding model. If there are a set/get/run functions, then we can directly do the wrapping.

While we can naturally return a GraphFactoryModule wrapper in C1, it can be complicated to do so in C0.

I agree this point if we donā€™t do automatic type lookup. (i.e. graph_runtime_factory.GraphRuntimeFactoryModule(tvm.runtime.load(path_lib)))

gmod = tvm.graph_runtime.create(lib, tvm.cpu(0))

Maybe we should consider another API name as python doesnā€™t support function overload. We have def create(graph_json_str, libmod, ctx) now if we decide to keep it.

We can discuss a potential API change if the concern is backward compat, for example, we can directly use the constructor in the new case. If the compatibility requirement is by positional argument, it is also likely that we can reuse the same function

We can discuss a potential API change if the concern is backward compat

yes. I also think it is worthy discussing in this thread. Like the relay.build I mention before, we should also discuss what flag should we set to indicate we are using new behavior / old behavior. After this, we could list one more complete summary IMO :slight_smile:

Given the variations comes from return values, it is a bit hard to maintain a backward compatible API in this case.

While we should in general aim to keep a backward compatible API, i feel in this case we probably should move on to the new API, in particular in light of all the other refactors in this cycle(wrt to runtime).

B0: use a legacy flag

We can certainly have a legacy_mode flag in relay build that gives the old behavior.

B1: Support unpack to tuple

We still return GraphFactoryModule in relay.build, but overloads the __iter__ to allow the following code. Add warning when iter is called then remove iter in the next release cycle.

# NOTE: can be combined into one line
factorymod = relay.build(...)
json_str, lib, params = factorymod

What API are you talking about? tvm.graph_runtime.create or relay.build?

Agree.

Agree.

I am referring to both APIs. We only need to support either B0 or B1, and it would be good to hear about everyoneā€™s opinions.

How about the change the tvm.graph_runtime.create API? Like relay.build, we give legacy_mode flag too?

I think we could take B1, which means we do not have to support the legacy mode flag in relay.build. In terms of the graph runtime creation, we could change the new usage to constructor and encourage users to switch over to the new way.

@FrozenGene given that we have mostly sorted out the APIs, can you try to take over and land this feature?

Sure, I will take it.

Awesome, it would be great if we start with an RFC issue that summarizes our API decisions and give one example. given that this thread is a bit long for people to read

1 Like

Awesome, it would be great if we start with an RFC issue

Do you mean we open it on GitHub or discuss forum?