Motivation
Some architectures can facilitate the LLVM backend with tensorize
to do code generation, but would require some extra operations before or after the generated loop program to ensure result correctness. An example of this would be the Gemmini matrix multiplication accelerator, whose execution flow can be embedded into normal LLVM code generation for RISC-V, but requires explicit fence
instructions to block the execution flow until data has been fully flushed back to DRAM.
It would be helpful to be able to insert void(void)
function calls before and after the generated nested loop program. The calling pair should be able to surround any level of the nested loop program for fine-grain control.
Proposed change
A pair of pragmas, prologue
and epilogue
, are proposed to support the pattern. The use case would be like the following:
s[C].pragma(yo, "prologue", "test_prologue")
s[C].pragma(yo, "epilogue", "test_epilogue")
with corresponding C code
extern "C" int test_prologue() {
printf("%s invoked\n", __func__);
return 0;
}
extern "C" int test_epilogue() {
printf("%s invoked\n", __func__);
return 0;
}
A more complete example can be found in this gist.
A quick implementation would be https://github.com/apache/incubator-tvm/pull/5050, to directly emit call nodes when doing LLVM codegen if the pragma is detected.
Discussion
- Is the form of
pragma
suitable for expressing this kind of operation? - Are the names
prologue
andepilogue
expressive? - The call nodes are directly emitted in LLVM codegen backend in the pull request. Is this the correct / preferred way to do this?