[BYOC] details about BYOC (newbie)

Hi,

I am following the BYOC example for the C codegen in the documentation, I have a few questions:

  1. During testing, do I need to recompile the whole TVM everytime I make a modification to the codegen?
  2. Once I compiled tvm with my C codegen, how can I make sure it is available? What target name should I use when building a model?
  3. Do I need to write annotations for my codegen using the python API in order for it to be used when building a model?
  4. Is it possible to get the generated C without building the model? If so, how? (Edit: answer is calling Module.get_source() from the python API)

Cheers

Edit: I have another question to add:

  1. For learning/testing purpose, can the generated functions ignore the output? I mean, is it possible to not bother about copying any result to the output tensor (just printing a message instead for example)?

Edit 2: This is where I am at:

  • I compiled tvm with the C codegen example (adding the cmake in contrib and using the right flag in the main cmake to activate it during compilation)

  • I registered the operators in python/tvm/relay/op/contrib/ccompiler.py. I used the name ccompiler as it is the one use in codegen.cc in the call TVM_REGISTER_GLOBAL("relay.ext.ccompiler").set_body_typed(CCompiler).

  • For the registration, I basically copied the dnnl.py one, renaming occurences of dnnl into ccompiler and commenting out:

    • some of the registered operators which are not present in the codegen
    • make_conv_pattern
    • make_dence_pattern
    • make_dnnl_pattern
    • the pattern table registration
  • I try compiling an ir module using the ccompiler target, it fails with the following error:

ValueError: Traceback (most recent call last):
  2: TVMFuncCall
  1: tvm::TargetInternal::ConstructorDispatcher(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: tvm::Target::Target(tvm::runtime::String const&) [clone .cold]
  File "/home/nicolas/work/tvm_codegen/tvm/src/target/target.cc", line 473
ValueError: Target kind "ccompiler" is not defined. Target creation from string failed: ccompiler
  1. Yes, because the codegen is now a part of TVM shared library. However, cmake with ccache should save your time for unchanged files.
  2. As the document indicates, you should apply a series of passes from AnnotateTarget to PartitionGraph. After that, you could print the IR to see if there’s any Relay function with attribute kCopiler="your-codegen-name". If so, then your codegen will be used when calling relay.build.
  3. Yes.
  4. As you already answered.
  5. I don’t understand this question, but a Relay program must have an output, and you cannot print message inside the Relay function. The error you got is a misuse. You should still use llvm as the target, which is the target for “non-offloaded” parts of the model. Again, as long as your IR includes function with kCompiler=your-codegen-name, then it will be offloaded to your codegen whatever the target you specify.

Thank you for answering. this helps quite a bit.

I don’t understand this question, but a Relay program must have an output, and you cannot print message inside the Relay function.

Let’s say for a a relay.nn.conv2d function, we produce a C function that prints some information. The motivation for this is to get confident with the BYOC framework before getting into more serious work.

So we can’t generate functions that prints a message (or have side effects)? Can we call an external library that has side effects?

I guess it is always possible to fill the output tensor with dummy values, but if it isn’t possible to print a message then my question loses its purpose.

In the BYOC case you can control everything in your generated code. The runtime only in charges of invoking the partitioned function on CPU by its name (i.e., symbol), so you could do anything you want. In short, it’s doable to generate/compile a C function with side effects and run it on a supported device.

Right, that was my initial thoughts.

Although my intial question was: can the generated code not bother about filling the output tensor with some values? But I realise that the question is a stupid as out is already allocated, whether its content is relevant or not shouldn’t create any major bug when playing.

To illustrate what I meant, let’s say in the documentation example I replace the macros to:

#define CSOURCE_BINARY_OP_1D(p_ID_, p_OP_, p_DIM1_)         \
    extern "C" void p_ID_(float* a, float* b, float* out) { \
        printf("I am calling a 1D op\n");                           \
    }

As the document indicates, you should apply a series of passes from AnnotateTarget to PartitionGraph.

I think everything starts to connect now, to recap (correct me if I am wrong):

I need to write python/tvm/relay/op/contrib/{CODEGEN_NAME}.py to tell tvm how to partition my graph for my codegen. The bare minimum this file should contain is:

  • registering external op used by my codegen
  • defining a function with the name partition_for_{CODEGEN_NAME}(mod, params=None) that applies transformations to mod, including at least:
    1. AnnotateTarget("{CODEGEN_NAME"})
    2. PartitionGraph()

If I have correctly built tvm with my codegen and have provided the mentionned python file then when building for llvm target tvm will automatically use the partition_for_{MY_CODEGEN} function.

If I am all correct, does the python file need to be there during the tvm building? (I assume no)

It seems like I am missing some bits in what I said above, when running the module after a build I don’t see anything getting printed, it looks like it didn’t automatically called partition_for_myccompiler from tvm/python/tvm/relay/op/contrib/myccompiler.py:

DEBUG:autotvm:Finish loading 35 records
WARNING:autotvm:One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
DEBUG:autotvm:Cannot find tuning records for:
    target=llvm -keys=cpu -link-params=0
    key=('conv2d_NCHWc.x86', ('TENSOR', (1, 1, 32, 32), 'float32'), ('TENSOR', (16, 1, 5, 5), 'float32'), (1, 1), (2, 2, 2, 2), (1, 1), 'NCHW', 'NCHW', 'float32')
TVM will apply a default schedule which may negatively impact performance.
INFO:te_compiler:Using conv2d_nchw.x86 for nn.conv2d based on highest priority (10)

And if I call it manually on the relay ir and call build on the result, this is what I get:

DEBUG:autotvm:Finish loading 35 records
Traceback (most recent call last):
  File "pt_relay_conv2d.py", line 69, in <module>
    mod = graph_executor.GraphModule(lib['default'](dev))
  File "/home/.../tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  5: TVMFuncCall
  4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::GraphExecutorFactory::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  3: tvm::runtime::GraphExecutorFactory::ExecutorCreate(std::vector<DLDevice, std::allocator<DLDevice> > const&)
  2: tvm::runtime::GraphExecutor::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  1: tvm::runtime::GraphExecutor::SetupOpExecs()
  0: tvm::runtime::GraphExecutor::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor, std::allocator<DLTensor> > const&)
  File "/home/.../tvm/src/runtime/graph_executor/graph_executor.cc", line 529
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (pf != nullptr) is false: no such function in module: tvmgen_default_myccompiler_main_0

After starting fresh it seems that the last error is coming from some previously unsynced modifications so I think you can forget about it.

Now I am getting this error Failed to find the codegen tool for relay.ext.ccompiler (full below). The codegen is registered with TVM_REGISTER_GLOBAL("relay.ext.ccompiler").set_body_typed(CCompiler) and the op are registered in the file at python/tvm/relay/op/contrib/ccompiler.py.

I also made sure to delete CODEGENC.cmake so that the original c codegen example doesn’t compile with library as this is a straight copy of it that uses the same names.

The python code that resulted in this is the following:

from tvm import relay
from tvm.contrib import graph_executor

mod = create_relay_mod()
from tvm.relay.op.contrib.ccompiler import partition_for_ccompiler
pmod = partition_for_ccompiler(mod)  # do "ccompiler" annotation, also graph partitionning
lib = relay.build(pmod, target)

# Generate graph executor
dev = tvm.device(target, 0)
m = graph_executor.GraphModule(lib["default"](dev))

dtype = 'float32'
set_module_inputs(m)
m.run()
output = m.get_output(0)
Traceback (most recent call last):
  File "/Users/.../tvm/tests/slai/relay_multiply.py", line 29, in <module>
    lib = R.build(pmod, target)
  File "/Users/.../tvm/python/tvm/relay/build_module.py", line 471, in build
    graph_json, runtime_mod, params = bld_mod.build(
  File "/Users/.../tvm/python/tvm/relay/build_module.py", line 199, in build
    self._build(mod, target, target_host, executor, runtime, workspace_memory_pools, mod_name)
  File "/Users/.../tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) 9   libtvm.dylib                        0x00000001182ed574 tvm::transform::Pass::operator()(tvm::IRModule) const + 184
  [bt] (7) 8   libtvm.dylib                        0x00000001182ed7c0 tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 244
  [bt] (6) 7   libtvm.dylib                        0x00000001182f02ec tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 872
  [bt] (5) 6   libtvm.dylib                        0x00000001182ed7c0 tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 244
  [bt] (4) 5   libtvm.dylib                        0x00000001182ee504 tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const + 1956
  [bt] (3) 4   libtvm.dylib                        0x00000001195dd350 tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<void tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::IRModule, tvm::transform::PassContext)>::AssignTypedLambda<tvm::relay::tec::LowerTEPass(tvm::runtime::String const&, std::__1::function<void (tvm::BaseFunc)>, tvm::VirtualDevice)::$_8>(tvm::relay::tec::LowerTEPass(tvm::runtime::String const&, std::__1::function<void (tvm::BaseFunc)>, tvm::VirtualDevice)::$_8)::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) + 888
  [bt] (2) 3   libtvm.dylib                        0x00000001195bd0cc tvm::relay::tec::LowerTE(tvm::IRModule const&, tvm::runtime::String const&, std::__1::function<void (tvm::BaseFunc)>, tvm::VirtualDevice) + 2480
  [bt] (1) 2   libtvm.dylib                        0x00000001195c4874 tvm::relay::tec::TECompilerImpl::LowerExternalFunctions() + 2388
  [bt] (0) 1   libtvm.dylib                        0x0000000117f9b6a0 tvm::runtime::detail::LogFatal::Entry::Finalize() + 84
  File "/Users/.../tvm/src/relay/backend/te_compiler.cc", line 206
TVMError:
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (pf) is false: Failed to find the codegen tool for relay.ext.ccompiler