[BYOC] What is the main difference between te.extern and BYOC?

Hi, I am currently learning the usage of BYOC and experimented with some cases like incorporating the ArmCompute Library with BYOC. Before this, I also tried some “call external function” cases with nnpack via te.extern.

My question is, I know BYOC is able to partition the graph and offload sub-graph level components completely to third-party runtime libraries, but for single operator cases, what is the difference between using BYOC and te.extern?

Although we might lose the ability to merge several operators as a sub-graph composite, to me it seems like te.extern can also do the offloading/call-external function thingy when it comes to single operators, and it can be recognized as an implementation thus can be used with autoTVM.

Thanks in advance!

@comaniac @tqchen @junrushao @lhutton1

1 Like

You are right, for a single op offloading, te.extern and BYOC are practically not that different. Of course, the compilation mechanisms are drastically different - te.extern based offloading would require more invasive changes (topi, op strategy etc).

Thanks for the help~ This clarifies the whole picture a lot!

Btw, can you elaborate more on what you mean by invasive changes brought by te.extern? From my understanding, we can’t change the inherent schedule for external functions from te.extern as well, so basically, we just have to add it as a plain candidate implementation for a certain operator in strategy right?

Yes this is exactly right.

By invasive, I meant that the te.extern based approach needs to modify things across the stack, like topi and op strategy, while BYOC is more localized. See and compare how cuDNN (te.extern) and CUTLASS (BYOC) support are implemented.

1 Like

Just wanting to point to an additional resource that lets you customise the lowering pipeline for your integration called Target Hooks: [pre-RFC] Additional Target Hooks. I believe this helps unlock the benefits of both mechanisms

2 Likes