Interface between Relay Vars and external tensors/te.placeholders?

Hi,

I am wondering if anyone can guide me to how the linking between Relay Vars to te.placeholders is done for the internal lowering process and external compiler process.

External compilers

In the case of external compilers (especially the DNNL), I think that the link is done by either

Q1: what exactly is this GetRef function doing?

  • I think this is the GetRef implementation but I cannot figure it out
  • if that is not how we are linking between relay variables and arrays of an external compiler, how is it being done in the DNNL example?

The other part that I cant quite understand is the TVM_DLL_EXPORT_TYPED_FUNC and the wrapper function. From the blog post:

// The wrapper function with all arguments in DLTensor type.
extern "C" int dnnl_0_wrapper_(DLTensor* arg0,
        DLTensor* arg1,
        DLTensor* arg2,
        DLTensor* out0) {

  // Cast all DLTensor to primitive type buffers and invoke the above
  // execution function.
  dnnl_0_(static_cast<float*>(arg0->data),
  static_cast<float*>(arg1->data),
  static_cast<float*>(arg2->data),
  static_cast<float*>(out0->data));
  return 0;
}

// The TVM macro to generate TVM runtime compatible function "dnnl_0"
// from our generated "dnnl_0_wrapper_".
TVM_DLL_EXPORT_TYPED_FUNC(dnnl_0, dnnl_0_wrapper_);

The dnnl_0_wrapper is expected to be called with 4 arguments

Q2: How exactly does using the TVM_DLL_EXPORT_TYPED_FUNC macro lead to dnnl_0_wrapper being called with the 4 arguments?

Internal Lowering

For the internal lowering process, I have less knowledge of how it is done.

import tvm
from tvm import relay

def min_relay_prog():

    x = relay.var('x', shape=(1,3, 224, 224))
    w = relay.var('w', shape=(16, 3, 3, 3))
    b = relay.var('b', shape=(16, ))
    
    conv2d = relay.op.nn.conv2d(x, w,data_layout="NCHW")
    bias = relay.op.nn.bias_add(conv2d, b)
    act = relay.op.nn.relu(bias)
    rfunc = relay.Function([x,b,w], act) #NOTE1
    mod = tvm.IRModule()
    mod["main"] = rfunc
    
    return mod


mod = min_relay_prog()
graph , lfunc, params = relay.build(mod,'c')

If I then do print(lfunc.get_source()), I get the following output:

#include "tvm/runtime/c_runtime_api.h"
#include "tvm/runtime/c_backend_api.h"
void* __tvm_module_ctx = NULL;
#ifdef __cplusplus
extern "C"
#endif
TVM_DLL int32_t fused_nn_conv2d_nn_bias_add_nn_relu_1(void* args, void* arg_type_ids, int32_t num_args, void* out_ret_value, void* out_ret_tcode, void* resource_handle) {
  void* arg0 = (((TVMValue*)args)[0].v_handle);
  int32_t arg0_code = ((int32_t*)arg_type_ids)[(0)];
  void* arg1 = (((TVMValue*)args)[1].v_handle);
  int32_t arg1_code = ((int32_t*)arg_type_ids)[(1)];
  void* arg2 = (((TVMValue*)args)[2].v_handle);
  int32_t arg2_code = ((int32_t*)arg_type_ids)[(2)];
  void* arg3 = (((TVMValue*)args)[3].v_handle);
  int32_t arg3_code = ((int32_t*)arg_type_ids)[(3)];
  void* placeholder = (((DLTensor*)arg0)[0].data);
  void* arg0_shape = (((DLTensor*)arg0)[0].shape);
  void* arg0_strides = (((DLTensor*)arg0)[0].strides);
  int32_t dev_id = (((DLTensor*)arg0)[0].ctx.device_id);
  void* placeholder1 = (((DLTensor*)arg1)[0].data);
  void* arg1_shape = (((DLTensor*)arg1)[0].shape);
  void* arg1_strides = (((DLTensor*)arg1)[0].strides);
  void* placeholder2 = (((DLTensor*)arg2)[0].data);
  void* arg2_shape = (((DLTensor*)arg2)[0].shape);
  void* arg2_strides = (((DLTensor*)arg2)[0].strides);
  void* T_relu = (((DLTensor*)arg3)[0].data);
  void* arg3_shape = (((DLTensor*)arg3)[0].shape);
  void* arg3_strides = (((DLTensor*)arg3)[0].strides);
//rest of output is omitted due to space

The args variable has all arguments of the fused relay graph. By investigating the rest of the code (which is omitted), it seems that placeholder is the input feature map, placeholder1 is the kernel and placeholder2 are the biases. This seems to contradict the order in which I declared the inputs to the Relay function (NOTE1 in python code).

I guess this makes sense, since this Relay function just encapsulates the Relay graph of operators. But then that means that, in some part of the lowering/codegen process args is being filled either from root to outputs (as FIFO) or from output to root (as stack). I guess the ordering at each operator (like conv2d) is in the order that the topi operator is defined (meaning ifm is first argument and kernel is second).

Q3: Is this in general how the args are constructed? if not please explain :slight_smile:

  • Where in the code can I actually see how this is done? I have been looking here but I feel I’m missing something
    • EDIT: I think I am a little closer to finding where the primfunc arguments are packed in args. It seems that printing the module after _build_for_device creates a new TIR representation with packedfunc variant.

Q4: How do the GraphRuntime and the lowered funcs actually communicate?

  • where in the code can I see this?

Thanks

a friendly cc @comaniac @zhiics