[pre-RFC] Additional Target Hooks

@Mousius thanks for this RFC and apologies for the long delay. I read this in conjunction with [RFC] Arm® Ethos™-U Integration to try to understand the initial application. I think that should be a sufficient example, but let me know if there are other use cases I should consider.

I discussed this with @jroesch and mbs-octoml at length a couple days ago. Documenting our discussion here.

Overall:

  • We agree there should be a way to leverage external codegen without recreating the entire compilation pipeline.

  • We want to ensure that this work is compatible with the ongoing TEcompiler refactor work–specifically, the TE-compiler refactor is now going to move towards unifying Relay → TIR lowering (and later unifying lower down the pipeline) across the Graph, AOT, and VM executors.

  • To that end, the case for a relay_to_tir hook and a tir_to_runtime hook seems clear. We’d like to clarify the interface of this hook, and propose:

    relay_to_tir(const IRModule& ir_module, const relay::Function& function) -> (IRModule, GlobalVar)
    

    The contract is TVM calls this interface with a read-only view of the IRModule containing function, plus the function in question to lower. The hook implementation should return an IRModule containing one or more functions implementing the lowered Relay function, plus a GlobalVar indicating the symbol name of the “top-level” function of that operator (in case multiple TIR functions are created to implement the operator).

    At present, TVM keeps the returned IRModule separate from the remaining lowered code. In the future, as part of the TECompiler refactor, TVM will merge the returned IRModule in with all other TIR functions, handling name conflicts.

  • For the tir_to_runtime hook, we presume this will follow the existing relay.ext. interface, just it will be specific to the target rather than a compiler attribute marked onto the relay Function.

  • In terms of user interface: theoretically it should be possible to hand TVM an unannotated Relay function plus a Target which specifies the available CPUs/accelerators, and TVM should leverage its knowledge of schedules to assign functions to devices. Currently, we either specify a mostly-homogenous target or manually mark functions to be run externally. In the future, we’re pondering that the interface could be: either TVM will assign the each function call to a target; or you can override this and mark it manually using a per-call-site or per-function attribute. In this case, the target contained in that attribute is not a composite Target, but instead a shorthand descriptor for one of the pieces of the overall Target. For example, Target could be specified as low_power_cpu: c -mcpu=cortex-m0; inference_cpu: c -mcpu=cortex-m7f, and call sites could be assigned to either low_power_cpu or inference_cpu. Does this sketch of a direction align with how you’d like to enable these target hooks?