[PIM][BYOC] How to make generated C modules aware of a global member in Custom Codegen

I am adding support for a CUSTOM hardware in tvm using Codegen method for TVM BYOC
Currently hardware requires a call to INIT and DEINIT methods. We need to add the calls in the generated C code using codegen instead of embedding this logic in custom runtime.
AS we can have n modules, one for each subgraph getting offloaded to custom hardware , current implementation insert these init and deinit calls in C code of each and every module, which turns out to be very inefficient.
Hence i needed some help in figuring out if there is a way to declare a bool operator (init_done) which somehow tells the custom codegen to only insert the init call in C code if init_done is false, else ignore the insertion all together.
the C modules look something like this:

module 1{
  CUSTOM_init()
  some calls
  CUSTOM_deinit()
}
..
..
..
module n{
  CUSTOM_init()
  some calls
  CUSTOM_deinit()
}

whereas i want it to be something like this:

module 1{
  CUSTOM_init()
  some calls
}
..
..
..
module n{
  some calls
  CUSTOM_deinit()
}

Hi, Which backend do you use?

hi @dilan i am using TVM BYOC using codegen method in cpp.
not sure what exactly u mean by backend. Care to elaborate?

Hi @yogeesh I meant to ask if you design a custom hardware backend, or use an existing backend like LLVM or OpenCL.

hi @dilan
we have a custom hardware backend.
though i would like to know if i can call a init() and deinit() functions only once accross all the byoc modules generated by BYOC codegen which are supposed to run on custom hardware.
thanks

We are not there yet. We also look to build a custom Hardware backend, but only started to look into TVM. Could I ask you for some introduction to getting started with the backend? All we found so far is the Vanilla example for adding custom device. And another question: Does TVM runtime run on the custom HW entirely (i.e. the CPU in the Vanilla example is the CPU of the custom device)? Thank you!

@dilan could you point me to the vanilla example of adding a custom device?

@yogeesh are you familiar with: https://tvm.apache.org/docs/tutorial/uma.html

and with

I think you can write a wrapped_init and wrapped_deinit that wrapped the original function with a cached counter. Then you compile this two wrapped function into a shared library (which contains global state). Finally you can let all your BYOC modules link to this shared library that contains the global member yout need.