[microTVM]Graph partitioning and deployment on multiple CPUs

Hello community, I am currently trying to deploy a partitioned Graph on multiple CPUs within our specialized hardware

  • the board contains 152 Processing elements , which have arm processors.
  • the software stack is built using c.

what I already did is the following : I partitioned the graph using the pipeline_executor.py and I have them exported into lib.so and param files, from what is available , the pipeline executor relies on the graph executor, on the other hand , I have created a template test project that was successfully deployed , this project is the implementation of a dummy network on a single core, it relies on the AOT executor and crt library. so now i am trying to find a connection between the two sides : microtvm support and pipe-lining. my current idea is to hard-code the process doing the following:

  1. create the Network
  2. divide it to submodules( e.g 2 submodules) using the graph_split function
  3. export each submodule independantly using the export_model_library_format()
  4. generate independant project for each exported library,
  5. manually add pipelining through interruptions ,
  6. compile and test.

the thing i want to point also is the presence of these files , microtvm_runtime.cc microtvm_graph_executor.cc threading_backend.cc I will be thankful if I can get your help or any idea that could be relevant to the issue .