Training in Relay

I’m trying to implement a proof of concept for using relay to compile a gradient function for a model that I could potentially deploy to e.g. an android platform for training on a mobile device.

I’m just starting to get familiarized with the TVM stack, so I apologize if I miss some obvious things. I’m working off of the latest TVM mainline repo.

I created a toy model in keras (a few affine layers with a binary target sigmoid output), and I can load that model into relay and verify that feed-forward prediction in relay matches feed-forward output in keras. I understand that relay is still missing gradient implementations for most operators (https://github.com/dmlc/tvm/issues/2562) so I implemented my own for nn.dense and nn.bias_add, and I can verify numerically that the results of those gradients match what I get from keras (as a side note, it was mostly a trial and error process for me to understand what shapes/orientations the original operator arguments were in, and I’m still not entirely sure what the role of collapse_sum_like plays—it would be great to have a more in-depth tutorial around how to implement gradients in relay for a few more complex operators, when I know what the mathematical form of the gradient computation looks like but not necessarily how to translate that to relay).

I am now getting stuck trying to figure out how to implement a loss function on top of my converted keras model function. I have something like this (but I’m mostly just stumbling around blindly at this point):

# func is my converted model that outputs a single sigmoid activation value, I want to add binary cross-entropy loss
shape = (1, 1)
dtype = 'float32'
t = relay.TensorType(shape, dtype)
y = relay.var("y", t)
loss_func = relay.Function(
    [y, *func.params],
    -(y * relay.op.log(func(*func.params)) + (relay.const(1.0) - y) * relay.op.log(relay.const(1.0) - func(*func.params))),
)

ex = relay.create_executor(target=target)
label = np.array([[1]]).astype('float32')
res = ex.evaluate(loss_func)(tvm.nd.array(label), tvm.nd.array(data.astype(dtype)), **params)

but I just get an exception: TVMError: Check failed: WellFormed(resolved_expr)

Is there an example I can look at somewhere for a simple loss function implementation in relay?

The other thing I was testing is (without adding the loss function), compiling the result of relay.ir_pass.gradient(func) to get a sense for what the compiled object might look like for shipping to android. The original feed-forward function compiles without issue. However, even though I can successfully get a gradient function from relay (and evaluate it in a relay executor), it fails when compiling:

Traceback (most recent call last):
  File "test_keras_toy.py", line 45, in <module>
    graph, lib, params = relay.build_module.build(gradient_func, target=target, params=params)
  File "/usr/tvm/python/tvm/relay/build_module.py", line 356, in build
    params)
  File "/usr/tvm/python/tvm/relay/build_module.py", line 183, in build
    self._build(func, target, target_host)
  File "/usr/tvm/python/tvm/_ffi/_ctypes/function.py", line 209, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /usr/tvm/build/libtvm.so(+0x6d72aa) [0x7efdc6b9d2aa]
  [bt] (7) /usr/tvm/build/libtvm.so(+0x4c7bd3) [0x7efdc698dbd3]
  [bt] (6) /usr/tvm/build/libtvm.so(+0x4d02e0) [0x7efdc69962e0]
  [bt] (5) /usr/tvm/build/libtvm.so(+0x6de851) [0x7efdc6ba4851]
  [bt] (4) /usr/tvm/build/libtvm.so(+0x4c31e6) [0x7efdc69891e6]
  [bt] (3) /usr/tvm/build/libtvm.so(+0x4c7bd3) [0x7efdc698dbd3]
  [bt] (2) /usr/tvm/build/libtvm.so(+0x4d02e0) [0x7efdc69962e0]
  [bt] (1) /usr/tvm/build/libtvm.so(+0x6dc8bf) [0x7efdc6ba28bf]
  [bt] (0) /usr/tvm/build/libtvm.so(+0x15c792) [0x7efdc6622792]
  File "/usr/tvm/src/relay/pass/fold_scale_axis.cc", line 241
TVMError: FoldScaleAxis only accept dataflow-form

I don’t know enough about TVM to know what this error means; what further pieces are currently missing in relay that I’d need before I’m able to compile a function to compute gradients on a target platform?

2 Likes

I’ve made some progress on implementing loss functions in Relay, and I can even verify now that gradients in Relay match what I get from Keras:

def np_to_relay(data, dtype="float32"):
    return tvm.nd.array(data.astype(dtype))

def add_loss_func(model_func):
    y = relay.var("y", shape=(1,1))
    y_hat = model_func.body
    loss = - (y * relay.log(y_hat) + (relay.const(1.0) - y) * relay.log(relay.const(1.0) - y_hat))
    return relay.Function([y, *model_func.params], loss)

def backprop_relay(model_func, label, data, params):
    loss_func = add_loss_func(model_func)
    gradient_func = relay.ir_pass.infer_type(relay.ir_pass.gradient(loss_func))
    return relay.create_executor(target=TARGET, ctx=CTX).evaluate(gradient_func)(
        np_to_relay(np.array([[label]])), np_to_relay(data), **params
    )

The issue I now run into is that I can run the gradient in the relay interpreter that I get from relay.create_executor, but I can’t compile the relay function:

>>> relay.build_module.build(gradient_func, TARGET, params=params)
Traceback (most recent call last):
[...]
  File "/usr/tvm/python/tvm/relay/build_module.py", line 284, in build
    params)
  File "/usr/tvm/python/tvm/relay/build_module.py", line 112, in build
    self._build(func, target, target_host)
  File "/usr/tvm/python/tvm/_ffi/_ctypes/function.py", line 209, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /usr/tvm/build/libtvm.so(+0x78d6ed) [0x7f751ca006ed]
  [bt] (7) /usr/tvm/build/libtvm.so(+0x787d2e) [0x7f751c9fad2e]
  [bt] (6) /usr/tvm/build/libtvm.so(+0x78c7ff) [0x7f751c9ff7ff]
  [bt] (5) /usr/tvm/build/libtvm.so(+0x789478) [0x7f751c9fc478]
  [bt] (4) /usr/tvm/build/libtvm.so(+0x4c0433) [0x7f751c733433]
  [bt] (3) /usr/tvm/build/libtvm.so(+0x4c8c70) [0x7f751c73bc70]
  [bt] (2) /usr/tvm/build/libtvm.so(+0x78ae79) [0x7f751c9fde79]
  [bt] (1) /usr/tvm/build/libtvm.so(+0x789514) [0x7f751c9fc514]
  [bt] (0) /usr/tvm/build/libtvm.so(+0x163da2) [0x7f751c3d6da2]
  File "/usr/tvm/src/relay/backend/graph_plan_memory.cc", line 121
TVMError: Check failed: it != token_map_.end(): 

This is using the latest TVM mainline as of 5/28, targeting llvm (and using llvm-9.0 from the cpu_ci docker build).

thank you for your work! I am working on training for relay right now.

so I implemented my own for nn.dense and nn.bias_add

can you make a PR on both of them? I am more than happy to have gradient that I dont need to write!

I’m still not entirely sure what the role of collapse_sum_like plays

lots of binary operator in relay implicitly broadcast, so for example x + y will implicitly broadcast. The result tensor (and the resulting gradient) will has more element then both x and y. collapse_sum_like is created to solve this problem: by doing collapse_sum_like(grad, x), the broadcasted dimension will get collapse into the shape of x.

but I just get an exception: TVMError: Check failed: WellFormed(resolved_expr)

One constraint relay has, is that all relay.Var binding (function argument, let binding, pattern match binding, etc) must be distinct. judging from the code, you are reusing func.params as the parameters for loss_func. You have to make a set of different relay.Var, and this is very easy to do with a duplicate() function.

TVMError: FoldScaleAxis only accept dataflow-form

there is a foldscaleaxis function that does some optimization. I recommend you to turn it off until you can make it work.

The issue I now run into is that I can run the gradient in the relay interpreter that I get from relay.create_executor , but I can’t compile the relay function:

things had changed alot, maybe you can try the latest pipeline?

Sorry for replying late, didnt see your post.

2 Likes

Thanks for your response!

I’ve made some progress since I last posted, and through discussion with some of the relay VM devs my understanding is that the relay VM executor is still very much in progress and incomplete. I am able to execute forward/backward in the relay vm (using first_order gradients, the VM is still missing pieces to support higher_order), but serialization of relay modules (for compilation) does not yet exist. The relay.build_module.build() compilation currently only targets for the graph runtime, which won’t work for derived gradient functions which generate ref reads/writes that aren’t supported there.

I need to better understand the right way to test gradient op implementations, but here is what I have for now which at least numerically matches what I get from keras on a toy model that only uses affine layers:

@register_gradient("nn.dense")
def dense_grad(orig, grad):
    data, weight = orig.args
    return [collapse_sum_like(transpose(transpose(weight) * grad), data),
            collapse_sum_like(transpose(grad * transpose(data)), weight)]


@register_gradient("nn.bias_add")
def bias_grad(orig, grad):
    data, bias = orig.args
    return [collapse_sum_like(grad, data),
            collapse_sum_like(grad, bias)]


@register_gradient("negative")
def negative_grad(orig, grad):
    """Returns [-grad]"""
    return [-grad]

I’m currently looking into implementing gradients to support a simple conv2d model, as well as extending the loss function to softmax crossentropy

2 Likes

For simple cases (without If), you should be able to remove ref reads/writes with PartialEvaluate and DeadCodeElimination.

1 Like

have you try the aot compiler? It support all features.

the default relay interpreter is also good for hacking.

DeadCodeElimination is broken rn, so I wont recommend using it - it might result in dropping gradient update. Me or @weberlo will fix it once we have free cycles.

1 Like

I briefly tried the aot compiler, but couldn’t get it working. Using the reference debug executor works well for testing out gradient ops (but doesn’t get me any closer to a compiled function that I ultimately want to run on an ARM device). I’m getting a bit stuck right now figuring out how to implement gradient for e.g. nn.max_pool2d—I don’t think I have enough understanding of the orig args to figure out how I would determine the position of the max input to project the gradient into.

The other thing I was wondering is what is the correct way to handle softmax + crossentropy loss in Relay? In most frameworks, for training you replace with a fused softmax+crossentropy because of the convenient form of the gradient, but in Relay would we still do that or defer to some optimization pass that understands how to fuse the two? Is there an example for how that would look?

I have implemented gradient for max and avg pool2d, will upstream it soon.
For max pool2d, you need to either output index in forward pass or calculate argmax in gradient

1 Like

The other thing I was wondering is what is the correct way to handle softmax + crossentropy loss in Relay?

I just implemented that Friday

[Relay] Add grads by MarisaKirisame · Pull Request #3506 · apache/tvm · GitHub.

what’s the error on getting aot to work? I am the maintainer of aot and we used it all the time - it work perfectly on a nightly build that went pass yesterday.

Awesome to see so much progress on the gradients! It’s been a while since I tried out the aot package, let me bring in the latest changes and give it another shot.

1 Like

OK, this is super exciting: I tried the AOT compiler again and it’s working for my relay gradient function! (The error I had before is that I was trying to pass function arguments as kwargs, but it seems like AOT only takes positional args).

I need to understand how the AOT compiler works in more detail, but: is there a C++ API for calling into an AOT compiled function? Right now the library returns a wrapped python binding to call the compiled function, but I’m interested in executing this on a platform completely natively without any python at all.

It is possible. The AOT compiler generate a C++ file, call a C++ compiler in command line, and open the loaded file. All of this is doable in C++, and they are mostly just calling the api.

However, the relay->C++ part is in python as well. Maybe the best way for you to do this is, compile the relay function into C++ function on a build server, cross compile the C++ onto your device, then just link that linked lib.


this is the driver of all aot code. the aot project is small and easy to understand - it can be read from top to bottom in one evening, and you can even skip the ‘generate C++ file’ part as you dont has to touch it.

I am happy to merge pr if you get anything working!
@jroesch @tqchen any thought?

@SWu another thing is, there are quite a few number of people trying to make training happen, and is writing gradients/fixing ad. It will be the best of all of ours interest, if we all open small training PR and get them merged. this way we can step on each other’s shoulder.