[BYOC] After using BYOC, the saved shared library size doubles

I’m trying to compile an ONNX model with TVM on CUDA GPU. When I enable the cublas BYOC, the saved file’s size doubles. Here is the script I use, for simplicity, I did not enable auto/meta scheduler and simply use the default schedule.

import argparse
import os.path as osp

import onnx
import tvm
from tvm import relay

def run(prefix):
    onnx_model = onnx.load(osp.join(prefix, "model.onnx"))
    mod, params = relay.frontend.from_onnx(onnx_model)

    if args.cublas:
        from tvm.relay.op.contrib.cublas import pattern_table
        seq = tvm.transform.Sequential(
            [
                relay.transform.InferType(),
                relay.transform.MergeComposite(pattern_table()),
                relay.transform.AnnotateTarget("cublas"),
                relay.transform.PartitionGraph(bind_constants=False),
                relay.transform.InferType(),
            ]
        )
        mod = seq(mod)

    with tvm.transform.PassContext(3):
        factory = relay.build(mod, tvm.target.cuda(arch="sm_70"), params=params)
    factory.export_library(osp.join(prefix, "model.so"))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('prefix', type=str)
    parser.add_argument('--cublas', action="store_true")
    args = parser.parse_args()
    run(args.prefix)

After running, I get a shared lib that is twice the size of the original onnx file.

image

I also tried to save the lib, params and graph_json seperately, but the total size does not change. But if I don’t use the cublas BYOC, the saved .so file size can match the onnx file size correctly. What is the possible reason for this?

2 Likes

The problem is that the constants are both saved in the GraphExecutorFactory and the cublas Module. This problem was solved apparently a while ago as you can see here:

But for this to work the external Module in this case the cublas module needs to implement a case for GetFunction where name == “get_const_vars”. An example of this for the json_runtime Module can be found here:

The last thing needed is a list of names of all the ConstantNodes in the function that is handled by the cublas module, that can be returned by the GetFunction(“get_const_vars”). It is important that these names are created in the same way as they are for the GraphExecutorFactory. The format of the naming scheme is as follows: symbol + “const” + const_id. The const_id is just an increment for every ConstantNode visited. An example of this can be found here:

Thank you for the answer. So you mean the constants are both saved in the GraphExecutorFactory and the cublas Module. But if I have set bind_constants=False, and that all params are passed into the BYOC module via a DLPACK tensor. The cublas module itself should not contain any constants?

Everything that is saved in the library comes from a SaveToBinary function from a certain module. So yes the cublas module doesn’t itself contain the constants. The duplication of the constants happens because the GraphExecutorFactory and the ConstLoaderModule both save the constants.

The result of this loop is that the common constants between the GraphExecutorFactory and the ConstLoaderModule are removed from the GraphExecutorFactory

I’m no expert about this at all. I just ran into the same problem as you did and this is what I think is the fix. But that doesn’t mean I know all the intricacies behind it :sweat_smile:

Thanks, It seem that the duplication come from factory.params and the ConstLoader. And in my case, forcing the ConstLoader to store nothing can fix the problem. By replacing the build_module.cc#L465 with

ret_.mod = tvm::codegen::CreateMetadataModule({}, ret_.mod, ext_mods, host_target,

This makes the output size normal and the inference result is not effected.

1 Like