TVM for novel heterogeneous computing architecture

Hi,

I am working on a novel accelerator architecture that uses heterogeneous computing and is mainly targeting deep learning workloads. I am trying to figure out how much work can be done at the compilation stage. The accelerator would comprise a few computing components with a different set of operations each with their own properties (different latency costs, some could introduce numerical errors, maybe some have SIMD support while others not, etc). The set of operations between the computing components are different, although the intersection is non-null.

So optimising a program for this kind of hardware requires to take into account all the costs, memory movements and so on. How far can I go with TVM to solving this optimisation problem?

Hi @slai-nick,

I am also interested in this kind of heterogeneous architectures. Would you like to get in touch to discuss this matter?