I’m trying to implement a proof of concept for using relay to compile a gradient function for a model that I could potentially deploy to e.g. an android platform for training on a mobile device.
I’m just starting to get familiarized with the TVM stack, so I apologize if I miss some obvious things. I’m working off of the latest TVM mainline repo.
I created a toy model in keras (a few affine layers with a binary target sigmoid output), and I can load that model into relay and verify that feed-forward prediction in relay matches feed-forward output in keras. I understand that relay is still missing gradient implementations for most operators (https://github.com/dmlc/tvm/issues/2562) so I implemented my own for nn.dense
and nn.bias_add
, and I can verify numerically that the results of those gradients match what I get from keras (as a side note, it was mostly a trial and error process for me to understand what shapes/orientations the original operator arguments were in, and I’m still not entirely sure what the role of collapse_sum_like plays—it would be great to have a more in-depth tutorial around how to implement gradients in relay for a few more complex operators, when I know what the mathematical form of the gradient computation looks like but not necessarily how to translate that to relay).
I am now getting stuck trying to figure out how to implement a loss function on top of my converted keras model function. I have something like this (but I’m mostly just stumbling around blindly at this point):
# func is my converted model that outputs a single sigmoid activation value, I want to add binary cross-entropy loss
shape = (1, 1)
dtype = 'float32'
t = relay.TensorType(shape, dtype)
y = relay.var("y", t)
loss_func = relay.Function(
[y, *func.params],
-(y * relay.op.log(func(*func.params)) + (relay.const(1.0) - y) * relay.op.log(relay.const(1.0) - func(*func.params))),
)
ex = relay.create_executor(target=target)
label = np.array([[1]]).astype('float32')
res = ex.evaluate(loss_func)(tvm.nd.array(label), tvm.nd.array(data.astype(dtype)), **params)
but I just get an exception: TVMError: Check failed: WellFormed(resolved_expr)
Is there an example I can look at somewhere for a simple loss function implementation in relay?
The other thing I was testing is (without adding the loss function), compiling the result of relay.ir_pass.gradient(func)
to get a sense for what the compiled object might look like for shipping to android. The original feed-forward function compiles without issue. However, even though I can successfully get a gradient function from relay (and evaluate it in a relay executor), it fails when compiling:
Traceback (most recent call last):
File "test_keras_toy.py", line 45, in <module>
graph, lib, params = relay.build_module.build(gradient_func, target=target, params=params)
File "/usr/tvm/python/tvm/relay/build_module.py", line 356, in build
params)
File "/usr/tvm/python/tvm/relay/build_module.py", line 183, in build
self._build(func, target, target_host)
File "/usr/tvm/python/tvm/_ffi/_ctypes/function.py", line 209, in __call__
raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (8) /usr/tvm/build/libtvm.so(+0x6d72aa) [0x7efdc6b9d2aa]
[bt] (7) /usr/tvm/build/libtvm.so(+0x4c7bd3) [0x7efdc698dbd3]
[bt] (6) /usr/tvm/build/libtvm.so(+0x4d02e0) [0x7efdc69962e0]
[bt] (5) /usr/tvm/build/libtvm.so(+0x6de851) [0x7efdc6ba4851]
[bt] (4) /usr/tvm/build/libtvm.so(+0x4c31e6) [0x7efdc69891e6]
[bt] (3) /usr/tvm/build/libtvm.so(+0x4c7bd3) [0x7efdc698dbd3]
[bt] (2) /usr/tvm/build/libtvm.so(+0x4d02e0) [0x7efdc69962e0]
[bt] (1) /usr/tvm/build/libtvm.so(+0x6dc8bf) [0x7efdc6ba28bf]
[bt] (0) /usr/tvm/build/libtvm.so(+0x15c792) [0x7efdc6622792]
File "/usr/tvm/src/relay/pass/fold_scale_axis.cc", line 241
TVMError: FoldScaleAxis only accept dataflow-form
I don’t know enough about TVM to know what this error means; what further pieces are currently missing in relay that I’d need before I’m able to compile a function to compute gradients on a target platform?