I have a custom accelerator which can compute Conv2D and Bias_Add in a single operation. How can I fuse these two operators into a single operator for better quantization and smoother code generation? As I see it, I would have to completely rewrite Conv2D and Bias_Add in a new Relay operator. Is there a simpler way?
Not something I’ve dealt with directly, but I belive that this is what the BYOC (bring your own codegen) system is for.
This allows you to mark operations which can be combined from a subgraph into a single call to your external system.
After FuseOps
pass, Conv2D
and Bias_Add
was in same FuseOp
I looked into using BYOC for operator fusion, but I don’t see how this method works with quantization. Pattern recognition always returns a custom function which cannot be recognized by the quantization passes as a function which can be quantized. Is there any way in which these two modules could work together?