Hi.
I’m currently working on spliting a model to several parts.
I made a program that analyze graph_json_str and splitting the graph for executing partially.
Then I call graph_executor.create() with saved .so file and graph_json_str to create graph_executor.
It works well but I have found some weird thing about the size of model, GPU mem allocation.
I simply split model to 1 : 2 : 3 : 4 ratio respectively.
And I monitor the GPU mem status by nvidia-smi and find out that allocated GPU mem grows by 4 : 2 : 3 : 4.
I tested several sinario and found that there is a some initial huge mem allocation.
When I first create some dummy graph_executor and then load models, GPU mem grows by ratio 1 : 2 : 3 : 4, as I expected.
So my question is Why this happened? Is there a some kind of initial mem allocation in creating graph_executor?
I’m currently use TVM 0.8.0 and Nvidia 2080ti GPU
Thanks in advance.