I am trying to understand how tvm uses the hardware attributes and where exactly. As I have seen in the code the graph level optimizations donβt use any hardware knowledge to make specific optimizations for the target. This means that the exported graph for a model will be the same for all the targets,am I right?
Now the next step after the nnvm.build is the tvm runtime, and there as I have seen in the code the target is used for specific optimization and scheduling. I see the different directories and functions under the topi folder but I cant find where it is decided which one to call. Basically, I cant understand how the tvm runtime part works. Can anyone guide me where to look?
Thank you in advance
Good question. Everything you said is right.
I suggest taking a look at this PR.
From NNVM, the target dispatch system kicks in here.
I am also trying to find how the graph annotations are transformed to tvm operations. To be more specific here is an example:
squeezenet0_conv0_weight
is translated to these tvm operations:
%3 = tvm_op(%data, %squeezenet0_conv0_weight, %squeezenet0_conv0_bias, num_outputs=β1β, num_inputs=β3β, func_name=βfuse_conv2dβ, flatten_data=β0β) %4 = tvm_op(%3, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_reluβ, flatten_data=β1β) %5 = tvm_op(%4, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_max_pool2dβ, flatten_data=β0β)
My question is where, in the code, this task is performed because I am trying to understand how the different functions that you can use in an nn are handled in the api
squeezenet0_conv0_weight stands for the weight of the first conv layer of squeeze net. So it is not correct to say that βsqueezenet0_conv0_weight is translated to three tvm ops (conv, relu, max_pool)β.
Graph(%data, %squeezenet0_conv0_weight, %squeezenet0_conv0_bias, %squeezenet0_conv1_weight, %squeezenet0_conv1_bias, %squeezenet0_conv2_weight, %squeezenet0_conv2_bias, %squeezenet0_conv3_weight, %squeezenet0_conv3_bias, %squeezenet0_conv4_weight, %squeezenet0_conv4_bias, %squeezenet0_conv5_weight, %squeezenet0_conv5_bias, %squeezenet0_conv6_weight, %squeezenet0_conv6_bias, %squeezenet0_conv7_weight, %squeezenet0_conv7_bias, %squeezenet0_conv8_weight, %squeezenet0_conv8_bias, %squeezenet0_conv9_weight, %squeezenet0_conv9_bias, %squeezenet0_conv10_weight, %squeezenet0_conv10_bias, %squeezenet0_conv11_weight, %squeezenet0_conv11_bias, %squeezenet0_conv12_weight, %squeezenet0_conv12_bias, %squeezenet0_conv13_weight, %squeezenet0_conv13_bias, %squeezenet0_conv14_weight, %3 = tvm_op(%data, %squeezenet0_conv0_weight, %squeezenet0_conv0_bias, num_outputs=β1β, num_inputs=β3β, func_name=βfuse_conv2dβ, flatten_data=β0β) %4 = tvm_op(%3, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_reluβ, flatten_data=β1β) %5 = tvm_op(%4, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_max_pool2dβ, flatten_data=β0β) %8 = tvm_op(%5, %squeezenet0_conv1_weight, %squeezenet0_conv1_bias, num_outputs=β1β, num_inputs=β3β, func_name=βfuse_conv2d_1β, flatten_data=β0β) %9 = tvm_op(%8, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_relu_1β, flatten_data=β1β) %12 = tvm_op(%9, %squeezenet0_conv2_weight, %squeezenet0_conv2_bias, num_outputs=β1β, num_inputs=β3β, func_name=βfuse_conv2d_2β, flatten_data=β0β) %13 = tvm_op(%12, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_relu_2β, flatten_data=β1β) %16 = tvm_op(%9, %squeezenet0_conv3_weight, %squeezenet0_conv3_bias, num_outputs=β1β, num_inputs=β3β, func_name=βfuse_conv2d_3β, flatten_data=β0β) %118 = tvm_op(%117, num_outputs=β1β, num_inputs=β1β, func_name=βfuse_softmaxβ, flatten_data=β0β) ret %118 } graph_attr_keys = [storage_id, shape, dltype, dtype]
So, if I understand what you are saying, the lines in the beginning describe the type of parameters of each layer and the tvm_op describe the different functions that are used on each layer?
In NNVM, input data, learnable parameters, and operators are all represented as a node in a Graph. So, what you have above is just showing a list of nodes contained in squeeze net. The first line is a node for input data. Then come nodes for parameters, followed by nodes for operators.
I see. My next question is how can I see which operations are supported in nnvm
Thank you very much. And something last, does each tvm operation coresponds to a single cuda or opencl kernel?
Yes, but a single kernel is typically a fused one, e.g. conv2d + batch norm + relu. Fusion is done automatically by NNVM when opt-level > 0.