[Relay] Strategy Error when adding new operator

MasterJianxing · June 11, 2023, 11:40am

Here is the problem. I try to add a fuse operator to relay.nn, and meet this problem.

File “/home/pan/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 81, in cfun rv = local_pyfunc(*pyargs)
File “/home/pan/tvm/python/tvm/relay/backend/te_compiler.py”, line 320, in lower_call best_impl, outputs = select_implementation(op, call.attrs, inputs, ret_type, target)

File "/home/pan/tvm/python/tvm/relay/backend/te_compiler.py", line 177, in select_implementation
all_impls = get_valid_implementations(op, attrs, inputs, out_type, target)
File "/home/pan/tvm/python/tvm/relay/backend/te_compiler.py", line 113, in get_valid_implementations
assert fstrategy is not None, (
AssertionError: nn.multihead_q doesn't have an FTVMStrategy registered. You can register one in 
python with `tvm.relay.op.register_strategy`.

But I TRULY add the register content. I follow Adding an Operator to Relay — tvm 0.13.dev0 documentation, and in ~/tvm/python/tvm/relay/op/strategy/generic.py, I wrote these codes:

And in ~/tvm/python/tvm/relay/op/nn/_nn.py, I wrote: But the strategy is just None.

Does anyone know the reason? Or does anyone successfully add new operator to relay? Could you please share the codes?

MasterJianxing · June 12, 2023, 2:50am

This error solved , another comes. zzz

This problem is solved by changing"multihead_q" to “nn.multihead_q”

I’m going to make this post a record about adding new operator to relay, and welcome friends to share their opinions.

Now the problem becomes:

sanirudh · June 12, 2023, 5:48am

Could you also post the code for te.compute that you’ve written. Looks like the lanes filed of DataType is set to 0 somehow, and I’m not sure I can directly tell why that could be without looking at the code. Based on just the trace, looks like the error is caused by the out_dtype.

MasterJianxing · June 12, 2023, 7:59am

Here it is!

def multihead_q( tensor_in, weight, bias, out_dtype=None, ): “”"Compute multihead query in Transformer network.

Parameters
----------
tensor_in : tvm.te.Tensor
    3-D with shape [batch, M, K].

weight : tvm.te.Tensor
    3-D with shape [batch, K, N].

bias : tvm.te.Tensor
    1-D with shape [N].

out_dtype : Optional[str]
    Specifies the output data type for mixed precision batch matmul.

Returns
-------
output : tvm.te.Tensor
    3-D with shape [batch, M, N]
"""
assert len(tensor_in.shape) == 3, "tensor_a only support 3-dim"

if out_dtype is None:
    out_dtype = tensor_in.dtype
    print(tensor_in.dtype)
    if tensor_in.dtype != weight.dtype:
        logger.warning(
            "tensor_a has different data type with tensor_b: %s, %s",
            tensor_in.dtype,
            weight.dtype,
        )

XB, XI, XK = get_const_tuple(tensor_in.shape)
YB, YK, YJ = get_const_tuple(weight.shape)

k = te.reduce_axis((0, XK), name="k")

compute_lambda = lambda b, i, j: te.sum(
        tensor_in[b if XB != 1 else 0, i, k].astype(out_dtype) # THE WRONG LINE
        * weight[b if YB != 1 else 0, k, j].astype(out_dtype),
        axis=k,
    )
q = te.compute(
    (XB, XI, YJ),
    compute_lambda,
    name="batch_matmul",
    tag="batch_matmul",
)
q_bias = cpp.expand_dims(bias, 0, 2)
q = cpp.add(q, q_bias)
q = cpp.reshape(q, [XB,XI,12,YJ/64])
output = cpp.transpose(q, [0,2,1,3])

return output

But I wrote that part by directly copying the batch_matmul topi operator

sanirudh · June 12, 2023, 9:03am

Thanks. You’re right, I don’t see anything wrong with this. I see that you’ve printed the out_dtype, that’s causing the error? Do you see anything unusual there?

If I understand the error correctly, the dtype used for cast operation when you call .astype(...) has 0 lanes, which doesn’t normall happen. I would suggest to try and recreate the issue in a smaller test case, like get the dtype, and write a small script to call .astype() directly with that and see if you can reproduce the error.

If you’re able to reproduce the error and it seems like it should work (suggesting some bug in TVM), I would suggest you to file an issue and I can try to fix the problem.

MasterJianxing · June 13, 2023, 6:05am

Many thanks, but that’s not a bug.

q = cpp.reshape(q, [XB,XI,12,YJ/64])

This line has problem. type(YJ / 64) == ‘float32’, and type(YJ // 64) == ‘int’

This is wierd though, is this a python intrinsic?

Still many thanks!

sanirudh · June 13, 2023, 11:12am

Sorry, I did not understand the question here. Is this still about the same error that you posted above?

What you’ve mentioned seems like the proper workflow. Since python3, result of / is floating point and result of // is integer, but I don’t think I understand the overall question.

BTW, you’re welcome . Always happy to help.

MasterJianxing · June 14, 2023, 2:05am

It is the problem that caused that error, because now I can run the operator. zzz

It seems that the shape info must be int type, and result of / is floating point, which makes the DataType error? I’m not an expert at this, the error message said some allocation error, maybe it just hint this.

MasterJianxing · June 20, 2023, 8:43am

After successfully adding an operator to my tvm, I want to add cuda support to this operator. I read Adding an Operator to Relay — tvm 0.13.dev0 documentation Part 5, and it seems I need to write something under ~/tvm/python/tvm/topi/cuda, and quote them in ~/tvm/python/tvm/relay/op/strategy/cuda.py.

But I found some operators does not have such make-strategy process, for example, layer_norm.

Then how does layer_norm support these targets without writing Part 5 codes?

Do you know the reason?

Many Thanks! @sanirudh

sanirudh · June 20, 2023, 3:37pm

Hi @MasterJianxing,

You’re right, a strategy needs to be implemented for all relay operators and without that, relay won’t be able to figure out the compute and schedule for its operators. In the case of layer_norm, I’m not sure why no strategy is registered, but without a strategy relay.nn.layer_norm operator cannot be directly lowered.

However, when I try to create an IRModule with relay.nn.layer_norm and call relay.optimize on that mod, I see that layer_norm gets broken down a bunch of simpler operators, which is probably done by some optimizaion pass and is probably why no one has yet run into the problem of no strategy for layer_norm

Below is the code example of what I did if you want to play around:

import tvm
from tvm import relay

data = relay.var("data", shape=(1,224,224,3))
gamma = relay.var("gamma", shape=(3,))
beta = relay.var("beta", shape=(3,))
norm = relay.nn.layer_norm(data, gamma, beta)

target = "cuda"

f = relay.function.Function([data, gamma, beta], norm)
mod = tvm.IRModule.from_expr(f)
mod = relay.transform.InferType()(mod)
print(mod)
print("-----------------------------------")

call = mod["main"].body
try:
    relay.backend.te_compiler.get_valid_implementations(call.op, call.attrs, call.args, call.checked_type, "cuda")
except Exception as e:
    print(e)
    print("-----------------------------------")

optimized_mod = relay.optimize(mod, target)
print(optimized_mod)