New Relay Operator based on TensorIR

MJKlaiber · January 26, 2022, 2:32pm

Hello everyone!

Does anyone know what the process is to implement a Relay Operator with an implementation based on TensorIR?

Following the Tutorial on adding a new relay operator (Adding an Operator to Relay — tvm 0.9.dev182+ge718f5a8a documentation), implementations can be added through the strategy interface.

However, our operator is implemented in TensorIR (Blitz Course to TensorIR — tvm 0.9.dev182+ge718f5a8a documentation), this means that we only have “schedule”, but no “compute”.

Is there a way to attach the TensorIR implemtation to “relay.Call” in a more direct way?

“relay.call_lowered()”https://github.com/apache/tvm/pull/9312/ seems to be an option, but how to link it to the relay.Call is currently unclear. If this is the way to go, could you provide a code snippet?

Thanks in advance for any suggestions!

CC @junrushao @abel-bernabeu @electriclilies @wrongtest @Hzfengsy @aca88 @SebastianBoblestETAS

junrushao · January 26, 2022, 10:02pm

Relax allows direct function calls from Relax to TIR. Example: https://github.com/tlc-pack/relax/blob/relax/tests/python/relax/test_transform.py#L238-L259.

CC @yuchenj would you like to elaborate?

yuchenj · January 26, 2022, 10:26pm

Hi @MJKlaiber,

What you want is exactly what we are implementing in Relax (Relay Next)! One key design decision we made in Relax is to allow the high-level IR to be able to directly interact and call into lower-level TensorIR and PackedFunc. To achieve this, we introduced two intrinsics to bridge the gap.

The TensorIR functions and many external libraries adopt a destination passing convention(we need to explicitly allocate the output and pass it in as an argument to the function), so we introduce call_tir, which is an intrinsic that allows user to call a TIR function or a packed function in an immutable way. The second intrinsic we introduced is call_packed which indicates a call to a packed function.

In the code snippet below, we implemented a TensorIR PrimFunc in TVMScript and directly call it inside the relax function with call_tir. For more context and implementation details, please refer to our design doc.

We will send out a forum discussion post to introduce Relax soon. It will talk about our high-level goals, highlights, development plans, and so on. Please feel free to check out our presentation at TVMCon.

electriclilies · January 26, 2022, 10:44pm

I think you don’t actually need a compute in all cases-- you should just be able to register a strategy and be good to go. For example, the threefry random number generation ops (the registration is here: https://github.com/apache/tvm/blob/main/python/tvm/relay/op/random/_kernel.py) only register a strategy, not a compute… There’s more implementation details in https://github.com/apache/tvm/blob/main/python/tvm/relay/op/random/kernel.py, I haven’t dug into it too deeply, but it looks like it only uses TIR.

I think @tkonolige wrote these ops, maybe he can elaborate a bit?

wrongtest · January 27, 2022, 9:46am

As far as I know, before Relax officially take out we have to implement relay_to_tir plugin to “bring your own lowering”. This could be just a relay pass to rewrite those relay calls which we are interested into call_lowered.

Basically I think we need:

Provide your own relay op strategy for TensorIR. Currently relay strategy machanism is only about TE compute and schedule yet.
Interact with your own op strategy in relay_to_tir pass, get PrimFunc for each relay call you want to lower via the strategy, then rewrite original expr to call_lowered of the primfunc.
Also if you want to lower fused functions, you have to determine how to compose PrimFuncs of sub-ops into a single PrimFunc. That could be a lot of things, but generally we can follow how TECompiler chain TE expressions of fused sub-ops.

MJKlaiber · January 30, 2022, 6:39pm

Thanks everybody for the responses! Helped a lot. There is one detail that is still open in my understanding: I’d like to import the NN from an onnx file, then use the relax operator that calls the TensorIR primfunc, as proposed in the previous responsens. How can I use an onnx or TF file as an input here? Is there a way using the current frontend flow? Or do I have to describe the entire NN in relax?

CC: @junrushao @yuchenj @electriclilies @wrongtest @aca88

yuchenj · January 30, 2022, 7:25pm

Hi @MJKlaiber, we are working on a relay to relax translator: https://github.com/tlc-pack/relax/pull/63, and put up a resnet demo potentially next week, so that you can import your model to relay first and translate that to relax, and call your hand-written TensorIR Primfunc in Relax. I will keep you updated about the progress.

aca88 · January 31, 2022, 9:52am

Hi,

I was wondering how the BYOC flow will be affected by the introduction of Relax. But I guess I can wait until the forum discussion post to ask there

yuchenj · January 31, 2022, 4:08pm

Hi @aca88, we are actively discussing it in this thread: https://github.com/tlc-pack/relax/issues/46. Feel free to join the discussion!

yuchenj · February 3, 2022, 8:22pm

@MJKlaiber, Thank you and Dennis for joining today’s Relax open dev meeting and showing great interest in Relax! The translator is implemented here: https://github.com/tlc-pack/relax/pull/75, and we will merge it later this week or early next week. Will keep you posted!