Hi,
I am wondering if anyone can guide me to how the linking between Relay Vars to te.placeholders
is done for the internal lowering process and external compiler process.
External compilers
In the case of external compilers (especially the DNNL), I think that the link is done by either
- visiting the Var nodes of the Relay graph
- visiting the arguments of the CallNodes, but here I only see it being used to extract known properties of the tensors. So the actual tensor pointer seems to not be extracted this way.
Q1: what exactly is this GetRef function doing?
- I think this is the GetRef implementation but I cannot figure it out
- if that is not how we are linking between relay variables and arrays of an external compiler, how is it being done in the DNNL example?
The other part that I cant quite understand is the TVM_DLL_EXPORT_TYPED_FUNC
and the wrapper function. From the blog post:
// The wrapper function with all arguments in DLTensor type.
extern "C" int dnnl_0_wrapper_(DLTensor* arg0,
DLTensor* arg1,
DLTensor* arg2,
DLTensor* out0) {
// Cast all DLTensor to primitive type buffers and invoke the above
// execution function.
dnnl_0_(static_cast<float*>(arg0->data),
static_cast<float*>(arg1->data),
static_cast<float*>(arg2->data),
static_cast<float*>(out0->data));
return 0;
}
// The TVM macro to generate TVM runtime compatible function "dnnl_0"
// from our generated "dnnl_0_wrapper_".
TVM_DLL_EXPORT_TYPED_FUNC(dnnl_0, dnnl_0_wrapper_);
The dnnl_0_wrapper
is expected to be called with 4 arguments
Q2: How exactly does using the TVM_DLL_EXPORT_TYPED_FUNC macro lead to dnnl_0_wrapper
being called with the 4 arguments?
Internal Lowering
For the internal lowering process, I have less knowledge of how it is done.
import tvm
from tvm import relay
def min_relay_prog():
x = relay.var('x', shape=(1,3, 224, 224))
w = relay.var('w', shape=(16, 3, 3, 3))
b = relay.var('b', shape=(16, ))
conv2d = relay.op.nn.conv2d(x, w,data_layout="NCHW")
bias = relay.op.nn.bias_add(conv2d, b)
act = relay.op.nn.relu(bias)
rfunc = relay.Function([x,b,w], act) #NOTE1
mod = tvm.IRModule()
mod["main"] = rfunc
return mod
mod = min_relay_prog()
graph , lfunc, params = relay.build(mod,'c')
If I then do print(lfunc.get_source())
, I get the following output:
#include "tvm/runtime/c_runtime_api.h"
#include "tvm/runtime/c_backend_api.h"
void* __tvm_module_ctx = NULL;
#ifdef __cplusplus
extern "C"
#endif
TVM_DLL int32_t fused_nn_conv2d_nn_bias_add_nn_relu_1(void* args, void* arg_type_ids, int32_t num_args, void* out_ret_value, void* out_ret_tcode, void* resource_handle) {
void* arg0 = (((TVMValue*)args)[0].v_handle);
int32_t arg0_code = ((int32_t*)arg_type_ids)[(0)];
void* arg1 = (((TVMValue*)args)[1].v_handle);
int32_t arg1_code = ((int32_t*)arg_type_ids)[(1)];
void* arg2 = (((TVMValue*)args)[2].v_handle);
int32_t arg2_code = ((int32_t*)arg_type_ids)[(2)];
void* arg3 = (((TVMValue*)args)[3].v_handle);
int32_t arg3_code = ((int32_t*)arg_type_ids)[(3)];
void* placeholder = (((DLTensor*)arg0)[0].data);
void* arg0_shape = (((DLTensor*)arg0)[0].shape);
void* arg0_strides = (((DLTensor*)arg0)[0].strides);
int32_t dev_id = (((DLTensor*)arg0)[0].ctx.device_id);
void* placeholder1 = (((DLTensor*)arg1)[0].data);
void* arg1_shape = (((DLTensor*)arg1)[0].shape);
void* arg1_strides = (((DLTensor*)arg1)[0].strides);
void* placeholder2 = (((DLTensor*)arg2)[0].data);
void* arg2_shape = (((DLTensor*)arg2)[0].shape);
void* arg2_strides = (((DLTensor*)arg2)[0].strides);
void* T_relu = (((DLTensor*)arg3)[0].data);
void* arg3_shape = (((DLTensor*)arg3)[0].shape);
void* arg3_strides = (((DLTensor*)arg3)[0].strides);
//rest of output is omitted due to space
The args
variable has all arguments of the fused relay graph. By investigating the rest of the code (which is omitted), it seems that placeholder
is the input feature map, placeholder1
is the kernel and placeholder2
are the biases. This seems to contradict the order in which I declared the inputs to the Relay function (NOTE1 in python code).
I guess this makes sense, since this Relay function just encapsulates the Relay graph of operators. But then that means that, in some part of the lowering/codegen process args
is being filled either from root to outputs (as FIFO) or from output to root (as stack). I guess the ordering at each operator (like conv2d) is in the order that the topi
operator is defined (meaning ifm is first argument and kernel is second).
Q3: Is this in general how the args
are constructed? if not please explain
- Where in the code can I actually see how this is done? I have been looking here but I feel I’m missing something
- EDIT: I think I am a little closer to finding where the primfunc arguments are packed in
args
. It seems that printing the module after_build_for_device
creates a new TIR representation with packedfunc variant.
- EDIT: I think I am a little closer to finding where the primfunc arguments are packed in
Q4: How do the GraphRuntime and the lowered funcs actually communicate?
- where in the code can I see this?
Thanks