Hi, I working on a custom codegen for a custom accelerator with TVM. This accelerator works with integer arithmetics. It has an accumulator with a high bit width and it can shift the value down as needed when writing back to achieve maximum precision.
To use this feature I modified the quantization flow such that an annotation operator is added to display how many of the bits in the integer are fractional bits. The problem is that this operator is always added in front of operators, but not all of those operators are compiled by the custom codegen and have to be lowered by the Graph Executor Codegen instead. The Graph Executor Codegen complains that this annotation operator is not lowered. However other annotation operators can be processed (e.g. cast_hint, stop_fusion). Is there something these operators implement regarding lowering which I forgot with my custom operator?
EDIT: I found out that it is the TNonComputational attribute. However it is required that the annotation operator is not removed by FoldConstant but is also processed by FuseOps. Is there an elegant way which combines both?