BYCG how-to guide does not have an example of using the custom codegen

The BYCG how-to guide contains this

After you finish the codegen and runtime, you can then let your customers annotate their models with your customized tag to make use of them. The tutorial for end-users to annotate and launch a specific codegen is here (TBA) .

Will this here (TBA) ever be a link to a tutorial that shows how to use a custom codegen?

I try to use a ccompiler like this

w = tvm.relay.const(kern)
x = tvm.relay.var('x', shape=dshape, dtype=eltype)
y = tvm.relay.nn.conv2d(x, w, padding=(1, 1), dilation=(1, 1), groups=1, channels=1, kernel_size=(1, 1))

mod = tvm.IRModule()
mod['main'] = tvm.relay.Function([x], y)
func = mod['main']
func = func.with_attr('Compiler', 'ccompiler')

with tvm.transform.PassContext(opt_level=3):
    graph, module, params = tvm.relay.build(func, target='llvm')

And get an error:

TVMError: Operators should be transformed away; try applyingthe fuse_ops transformation to the expression.

How to use a custom codegen?

Sorry we haven’t updated the document to reflect the latest changes. Meanwhile, you may refer to this blog post:

https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm

Thank you Codey. Is it possible to see the entire file that contains these lines from the blog post?

mod = create_relay_module_from_model() # Output: Figure 1
mod = transform.MergeComposite(pattern_table)(mod)
mod = transform.AnnotateTarget(["dnnl"])(mod) # Output: Figure 2
mod = transform.MergeCompilerRegions()(mod) # Output: Figure 3
mod = transform.PartitionGraph()(mod) # Output: Figure 4

I just need a single but complete example that works.

For example, I run this script

    import numpy
    import tvm
    import tvm.relay

    dshape = (64, 1, 32, 32)
    kshape = ( 1, 1,  1,  1)
    eltype = 'uint16'
    scale = 1

    data = numpy.random.uniform(-scale, scale, size=dshape).astype(eltype)
    kern = numpy.random.uniform(-scale, scale, size=kshape).astype(eltype)

    w = tvm.relay.const(kern)
    x = tvm.relay.var('x', shape=dshape, dtype=eltype)
    y = tvm.relay.nn.conv2d(x, w, padding=(1, 1), dilation=(1, 1), groups=1, channels=1, kernel_size=(1, 1))

    mod = tvm.IRModule()
    mod['main'] = tvm.relay.Function([x], y)
    mod = tvm.relay.transform.AnnotateTarget(["ccompiler"])(mod)

    with tvm.transform.PassContext(opt_level=3):
        graph, module, params = tvm.relay.build(mod, target='llvm')

and I get

Traceback (most recent call last):
  File "y.py", line 3, in <module>
    import tvm
  File "/Users/dmakarov/work/git/tvm/python/tvm/__init__.py", line 41, in <module>
    from .ir import IRModule
  File "/Users/dmakarov/work/git/tvm/python/tvm/ir/__init__.py", line 34, in <module>
    from . import diagnostics
  File "/Users/dmakarov/work/git/tvm/python/tvm/ir/diagnostics/__init__.py", line 73, in <module>
    class Diagnostic(Object):
  File "/Users/dmakarov/work/git/tvm/python/tvm/_ffi/registry.py", line 69, in register
    check_call(_LIB.TVMObjectTypeKey2Index(c_str(object_name), ctypes.byref(tidx)))
  File "/Users/dmakarov/work/git/tvm/python/tvm/_ffi/base.py", line 344, in check_call
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (7) 8   ???                                 0x00007ffee5ac3f00 0x0 + 140732751691520
  [bt] (6) 7   _ctypes.cpython-38-darwin.so        0x0000000114bb6177 ffi_call_unix64 + 79
  [bt] (5) 6   libtvm.dylib                        0x000000011a5fde6b TVMObjectTypeKey2Index + 43
  [bt] (4) 5   libtvm.dylib                        0x000000011a5fdf45 tvm::runtime::ObjectInternal::ObjectTypeKey2Index(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 21
  [bt] (3) 4   libtvm.dylib                        0x000000011a5fd67d tvm::runtime::Object::TypeKey2Index(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 29
  [bt] (2) 3   libtvm.dylib                        0x000000011a5fd7a4 tvm::runtime::TypeContext::TypeKey2Index(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&) + 276
  [bt] (1) 2   libtvm.dylib                        0x000000011829ef75 dmlc::LogMessageFatal::~LogMessageFatal() + 21
  [bt] (0) 1   libtvm.dylib                        0x00000001182a2431 dmlc::LogMessageFatal::~LogMessageFatal() + 65
  File "../src/runtime/object.cc", line 155
TVMError: Check failed: it != type_key2index_.end(): Cannot find type Diagnostic. Did you forget to register the node by TVM_REGISTER_NODE_TYPE ?

All I want is to understand how I can use a custom codegen from the user’s code.

Your example looks good. The only problem is you should not use the ccompiler, because ccompiler is only for demonstration purpose and it doesn’t support conv2d. Your example should work if you use dnnl or other custom codegen.

Cody, I realized using conv2d is wrong with ccompiler, so I changed the example slightly

import numpy
import tvm
import tvm.relay

x = tvm.relay.var("x", shape=(3,), dtype="float32")
y = tvm.relay.var("y", shape=(3,), dtype="float32")
z = x + y
mod = tvm.IRModule.from_expr(tvm.relay.Function([x, y], z))
mod = tvm.relay.transform.AnnotateTarget(["ccompiler"])(mod)

with tvm.transform.PassContext(opt_level=3):
    graph, module, params = tvm.relay.build(mod, target='c')
module.save('y.cc', fmt='cc')

and now it breaks with this error

Traceback (most recent call last):
  File "y.py", line 12, in <module>
    graph, module, params = tvm.relay.build(mod, target='c')
  File "/Users/dmakarov/work/git/tvm/python/tvm/relay/build_module.py", line 275, in build
    graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
  File "/Users/dmakarov/work/git/tvm/python/tvm/relay/build_module.py", line 138, in build
    self._build(mod, target, target_host)
  File "/Users/dmakarov/work/git/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) 9   libtvm.dylib                        0x0000000116590cdd tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*) + 2781
  [bt] (7) 8   libtvm.dylib                        0x000000011659bca0 tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::relay::Call, tvm::runtime::Array<tvm::te::Tensor, void>&, tvm::Target&>(tvm::relay::Call&&, tvm::runtime::Array<tvm::te::Tensor, void>&, tvm::Target&) const + 304
  [bt] (6) 7   libtvm.dylib                        0x00000001142c2561 std::__1::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const + 65
  [bt] (5) 6   libtvm.dylib                        0x00000001142c279a std::__1::__function::__value_func<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) const + 106
  [bt] (4) 5   libtvm.dylib                        0x0000000116b64b98 std::__1::__function::__func<TVMFuncCreateFromCFunc::$_2, std::__1::allocator<TVMFuncCreateFromCFunc::$_2>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 72
  [bt] (3) 4   libtvm.dylib                        0x0000000116b660c7 std::__1::__function::__alloc_func<TVMFuncCreateFromCFunc::$_2, std::__1::allocator<TVMFuncCreateFromCFunc::$_2>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 71
  [bt] (2) 3   libtvm.dylib                        0x0000000116b66117 void std::__1::__invoke_void_return_wrapper<void>::__call<TVMFuncCreateFromCFunc::$_2&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>(TVMFuncCreateFromCFunc::$_2&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 71
  [bt] (1) 2   libtvm.dylib                        0x0000000116b661b3 decltype(std::__1::forward<TVMFuncCreateFromCFunc::$_2&>(fp)(std::__1::forward<tvm::runtime::TVMArgs>(fp0), std::__1::forward<tvm::runtime::TVMRetValue*>(fp0))) std::__1::__invoke<TVMFuncCreateFromCFunc::$_2&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*>(TVMFuncCreateFromCFunc::$_2&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 115
  [bt] (0) 1   libtvm.dylib                        0x0000000116b6628e TVMFuncCreateFromCFunc::$_2::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const + 190
  File "/Users/dmakarov/work/git/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/Users/dmakarov/work/git/tvm/python/tvm/relay/backend/compile_engine.py", line 297, in lower_call
    best_impl, outputs = select_implementation(op, call.attrs, inputs, ret_type, target)
  File "/Users/dmakarov/work/git/tvm/python/tvm/relay/backend/compile_engine.py", line 186, in select_implementation
    all_impls = get_valid_implementations(op, attrs, inputs, out_type, target)
  File "/Users/dmakarov/work/git/tvm/python/tvm/relay/backend/compile_engine.py", line 125, in get_valid_implementations
    assert fstrategy is not None, "%s doesn't have FTVMStrategy registered" % op.name
AssertionError: annotation.compiler_end doesn't have FTVMStrategy registered

I’d like to see the custom codegen invoked on an entire graph

Sorry I missed that. You also need to invoke MergeCompilerRegions and PartitionGraph passes before build. You may refer to this test case (L462) for the whole example:

Thank you. This seems to work. Is it possible to pass options/flags to a custom code generator? I see that currently it only receives an ObjectRef&, which expected to be a pointer to a FunctionNode instance. Maybe some Function attribute could be used to pass a string of options/flags?

You can register an option/flag to the PassContext, so that you can set it when building a Relay program. Here is an example (L130):

Thank you very much. This is very helpful. Now I have a problem with export_library. It looks like, the assumption is that the custom codegen generates a source file, that is compiled by an external compiler when export_library is invoked on the module that relay.build returns. What if my codegen returns a runtime::Module object that already contains data in binary format, that only need to be saved in a file? Previously I used module.save(filename), but now this method is not supported by GraphRuntimeFactoryModule. I need this data to be in a file, because I use tvm.rpc to upload it to a remote target and it seems that I can only load_module on the remote by name. So previously my sequence of steps was something like this

    module = tvm.relay.build(mod, target='llvm')
    binary = 'model.dnn'
    module.save(binary)
    remote = tvm.rpc.connect('localhost', 9090)
    remote.upload(binary)
    handle = remote.load_module(binary)
    module = tvm.contrib.graph_runtime.GraphModule(handle["default"](remote.ext_dev(0)))
    module.set_input('x', tvm.nd.array(data.astype(eltype)))
    module.run()

but now this doesn’t work, because I can’t save the compiled module. What’s the right way to do this?

It’s actually more straightforward to save the binary format you generated, because we don’t need to involve a compiler when exporting the library. You only need to implement SaveToBinary and LoadFromBinary in your runtime module, and it will be invoked when exporting/loading the runtime.

Example:

Yes, I implemented SaveToBinary:

  void SaveToBinary(dmlc::Stream* stream) override final {
    std::cout << "DnnContainerModuleNode savetobinary" << std::endl;
    stream->Write(code_);
}

where code_ is a string of bytes that essentially is contents of the binary file that I need to save. This method is invoked but it seems TVM wraps this data in a file native for the system

file model.dnn
model.dnn: Mach-O 64-bit dynamically linked shared library x86_64

Isn’t there a way to simply save a binary file with the content that I fully control? The python code that generated this file

    module = tvm.relay.build(mod, target='llvm')
    binary = 'model.dnn'
    module.export_library(binary)

I actually don’t need the target to be ‘llvm’, but I don’t know which target I should use. If I understand correctly adding a custom codegen, doesn’t define a new target.

Or where does the Stream passed to SaveToBinary is being written to?..

You should export the library with .so as the file extension. Although you fully control how your runtime will be serialized, like you said, TVM wraps this runtime in other runtime modules, which are 1) a metadata module to maintain metadata, and 2) a host LLVM runtime module to execute the unoffloaded part of the model. As a result, you actually need the target to be LLVM. I understand this is a bit confusing, so we are trying to figure out a better API.

I see… The way we want to use TVM is to generate a binary in our own custom format, uploaded it via RPC to the target and run. We control the remote target via TVM RPC and we have our own RPC server running on the remote target. We actually run entire model on the remote target so there’s no unoffloaded part of the model in our case. If I understand correctly and if I use the TVM API without modifications, it sounds like now we need to be able to extract on the RPC server from .so file the binary module that we generated – is this correct? Like I mentioned, previously when there was no GraphRuntimeFactoryModule wrapper, I just invoked .save(filename) method on the module returned by the codegenerator and that dumped the binary file in our format to disk, from where it was uploaded to the remote runtime system via TVM RPC.

Not sure if I fully catch the point but let me try.

From the previous JSON runtime example I posted, you can see that the serialization mechanism can be fully customized, as long as your LoadFromBinary can construct an identical runtime module from the binary generated by SaveToBinary. In other words, although the file extension is .so, you can actually use any format of byte array.

We recommend the above approach, because users will only get a single .so file that includes everything – the TVM host module and your custom runtime module. Please note that we recommend this way even you are offloading an entire model (in this case the TVM host module has only one giant node), because it is general to cover both full and partial offloading.

Besides, you can still upload the .so file via RPC to a remote target. As long as your remote target has TVM runtime deployed, it should also include the implementation of your custom runtime, meaning that it can deserialize the .so file and build the runtime module.

Cody, it makes sense that there’s save/load module symmetry between the TVM client and the TVM RPC remote server if indeed the TVM RPC remote server runs the TVM runtime. In our case the symmetry is broken, because our RPC server does not run the TVM runtime, but implements enough of TVM RPC protocol to receive a binary module from the TVM client, load it, set inputs, run the module, and then send back the output. Now, it seems we need to implement parsing the .so file in our implementation of the TVM RPC server, so that it can extract the graph json, and the compiled model’s binary from the uploaded .so file. We use TVM to compile models to be executable on our hardware, and to test the hardware and the generated code. In actual deployment the compiled models will run without the TVM runtime.