Hi,
I want to deploy a model in an distributed environment and thought about using TVM as it has wide range of backends and the RPC infrastructure would be helpful to coordinate the inference.
I am currently working on the algorithm to decide which parts of the networks are going to be mapped to which device.
But am unsure on how to actually split up my networks. As I assume that I will be creating completely independent runtime modules, I guess a compiler pass in the current infrastructure will not be able to achieve this.
Can anybody suggest a simple approach, that would fit into TVM’s current architecture?
Thanks in advance for your help