Can an operator have more than one target?

I’m learning the architecture of TVM and I realized that during the scheduling and code generation phase for the operator, the target has to be selected and can only be one. I was wondering if it is possible to implement the operator underlay on multiple heterogeneous hardware. For example, when we do the scheduling and code generation for matrix multiplication, part of it is executed on the CPU and part on the GPU (possibly partitioning the matrix?) as a way to increase concurrency

I suspect this may have a relatively large interaction overhead, so I’d like to ask for advice on whether this approach is feasible and the reasons why TVM has not considered this option.