[BYOC] After using BYOC, the saved shared library size doubles

nox-410 · August 26, 2023, 7:28am

I’m trying to compile an ONNX model with TVM on CUDA GPU. When I enable the cublas BYOC, the saved file’s size doubles. Here is the script I use, for simplicity, I did not enable auto/meta scheduler and simply use the default schedule.

import argparse
import os.path as osp

import onnx
import tvm
from tvm import relay

def run(prefix):
    onnx_model = onnx.load(osp.join(prefix, "model.onnx"))
    mod, params = relay.frontend.from_onnx(onnx_model)

    if args.cublas:
        from tvm.relay.op.contrib.cublas import pattern_table
        seq = tvm.transform.Sequential(
            [
                relay.transform.InferType(),
                relay.transform.MergeComposite(pattern_table()),
                relay.transform.AnnotateTarget("cublas"),
                relay.transform.PartitionGraph(bind_constants=False),
                relay.transform.InferType(),
            ]
        )
        mod = seq(mod)

    with tvm.transform.PassContext(3):
        factory = relay.build(mod, tvm.target.cuda(arch="sm_70"), params=params)
    factory.export_library(osp.join(prefix, "model.so"))

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('prefix', type=str)
    parser.add_argument('--cublas', action="store_true")
    args = parser.parse_args()
    run(args.prefix)

After running, I get a shared lib that is twice the size of the original onnx file.

I also tried to save the lib, params and graph_json seperately, but the total size does not change. But if I don’t use the cublas BYOC, the saved .so file size can match the onnx file size correctly. What is the possible reason for this?

shepard · September 14, 2023, 9:20am

The problem is that the constants are both saved in the GraphExecutorFactory and the cublas Module. This problem was solved apparently a while ago as you can see here:

But for this to work the external Module in this case the cublas module needs to implement a case for GetFunction where name == “get_const_vars”. An example of this for the json_runtime Module can be found here:

github.com

apache/tvm/blob/24847c55151825ebf4c655cb2e3c5c09c61b48c8/src/runtime/contrib/json/json_runtime.h#L82


/*!
 * \brief Get a packed function.
 * \param name The name/symbol of the function.
 * \param sptr_to_self The pointer to the module node.
 * \return The packed function.
 */
PackedFunc GetFunction(const String& name, const ObjectPtr<Object>& sptr_to_self) override {
  if (name == "get_symbol") {
    return PackedFunc(
        [sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { *rv = this->symbol_name_; });
  } else if (name == "get_const_vars") {
    return PackedFunc(
        [sptr_to_self, this](TVMArgs args, TVMRetValue* rv) { *rv = this->const_names_; });
  } else if (this->symbol_name_ == name) {
    return PackedFunc([sptr_to_self, this](TVMArgs args, TVMRetValue* rv) {
      ICHECK(this->initialized_) << "The module has not been initialized";


      // Bind argument tensors to data entries.
      this->SetInputOutputBuffers(args);
      // Execute the subgraph.
      this->Run();

The last thing needed is a list of names of all the ConstantNodes in the function that is handled by the cublas module, that can be returned by the GetFunction(“get_const_vars”). It is important that these names are created in the same way as they are for the GraphExecutorFactory. The format of the naming scheme is as follows: symbol + “const” + const_id. The const_id is just an increment for every ConstantNode visited. An example of this can be found here:

github.com

apache/tvm/blob/24847c55151825ebf4c655cb2e3c5c09c61b48c8/src/relay/backend/contrib/dnnl/codegen.cc#L230


  // Generate the global variable for needed ndarrays
  if (const_array_name_.empty()) {
    const_array_name_ = CreateNDArrayPool(ext_func_id_);
    std::string checker = CreateInitChecker(ext_func_id_);
    ext_func_body_.insert(ext_func_body_.begin(), checker);
  }


  // Give the ndarray a unique name to ease the initialization of it at
  // runtime.
  std::string const_symbol = "dnnl_" + ext_func_id_;
  std::string const_var_name = CreateConstVar(const_symbol, const_idx_);
  const_vars_.push_back(const_var_name);
  const_idx_++;


  const auto* type_node = cn->checked_type().as<TensorTypeNode>();
  ICHECK(type_node);
  ICHECK_EQ(GetDtypeString(type_node), "float") << "Only float is supported for now.";


  return {output};
}

nox-410 · September 14, 2023, 12:37pm

Thank you for the answer. So you mean the constants are both saved in the GraphExecutorFactory and the cublas Module. But if I have set bind_constants=False, and that all params are passed into the BYOC module via a DLPACK tensor. The cublas module itself should not contain any constants?

shepard · September 14, 2023, 1:06pm

Everything that is saved in the library comes from a SaveToBinary function from a certain module. So yes the cublas module doesn’t itself contain the constants. The duplication of the constants happens because the GraphExecutorFactory and the ConstLoaderModule both save the constants.

The result of this loop is that the common constants between the GraphExecutorFactory and the ConstLoaderModule are removed from the GraphExecutorFactory

github.com

apache/tvm/blob/e2e1d44c7cb22f869275744b69865dc55b439313/src/relay/backend/build_module.cc#L468


    ret_.mod = tvm::codegen::CSourceModuleCreate(";", "", Array<String>{});
  }
} else {
  ret_.mod = tvm::TIRToRuntime(lowered_funcs, host_target);
}


auto ext_mods = executor_codegen_->GetExternalModules();
ret_.mod = tvm::codegen::CreateMetadataModule(ret_.params, ret_.mod, ext_mods, host_target,
                                              runtime_, executor_,
                                              executor_codegen_->GetExecutorCodegenMetadata());
// Remove external params which were stored in metadata module.
for (tvm::runtime::Module mod : ext_mods) {
  auto pf_var = mod.GetFunction("get_const_vars");
  if (pf_var != nullptr) {
    Array<String> variables = pf_var();
    for (size_t i = 0; i < variables.size(); i++) {
      auto it = ret_.params.find(variables[i].operator std::string());
      if (it != ret_.params.end()) {
        VLOG(1) << "constant '" << variables[i] << "' has been captured in external module";
        ret_.params.erase(it);
      }

I’m no expert about this at all. I just ran into the same problem as you did and this is what I think is the fix. But that doesn’t mean I know all the intricacies behind it

nox-410 · September 15, 2023, 4:09am

Thanks, It seem that the duplication come from factory.params and the ConstLoader. And in my case, forcing the ConstLoader to store nothing can fix the problem. By replacing the build_module.cc#L465 with

ret_.mod = tvm::codegen::CreateMetadataModule({}, ret_.mod, ext_mods, host_target,

This makes the output size normal and the inference result is not effected.