We are writing a codegen for our NN accelerator (called kendryte
) by following BYOC
The whole compilation flow looks like:
-
bind_params_by_name
to embed weights and other params -
AnnotateTarget
to annotate supported ops bykendryte
target -
MergeCompilerRegions
&PartitionGraph
- tvm calls our registered func
relay.ext.kendryte
- In this func we compile the annotated subgraph into a
kendryte
module which contains a binary (called ‘kmodel’) - required weights and other params have already been serialized into the ‘kmodel’
- In this func we compile the annotated subgraph into a
But there comes a problem:
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target='llvm', params=params)
lib.export_library('model.so')
The model.so
seems to serialize the weights double times:
- One copy in our
kendryte
module - Another copy in the graph module
When I run:
lib = tvm.runtime.load_module('model.so')
from tvm.contrib import graph_runtime
m = graph_runtime.GraphModule(lib["default"](ctx))
m.set_input('input', <input data here>)
m.run()
tvm complains about:
..\..\src\runtime\graph\graph_runtime.cc:93: Warning: cannot find "kendryte_0_const_52" among input
..\..\src\runtime\graph\graph_runtime.cc:93: Warning: cannot find "kendryte_0_const_48" among input
...
The model runs correctly and produces expected results.
But the size of the model.so
is about 2 times as large as the origin model.
Does anybody knows how to remove unused params from the graph module?
Thank you.