Difference in profiler outputs

Hi. I tried to get an operator-wise breakdown of inference time for ResNet-50 using two profiling mechanisms - one via tvm.runtime.profiler_vm.profile( ), the other via tvm.contrib.debugger.debug_executor.profile( ). Even though I passed identical parameters in, the reports I receive as output show very different times. How are these two profiling functions different? When should one use the former and when should one use the latter?

Code for (1):

# Tuning done already.
with autotvm.apply_graph_best(opt_sch_file):
    with tvm.transform.PassContext(opt_level=3):
        exe = relay.vm.compile(mod, target, params=params)
        vm = profiler_vm.VirtualMachineProfiler(exe, dev)
        report = vm.profile([data], func_name="main", number=100, repeat=3, end_to_end=True)
        print(report)

(2):

# Tuning done already.
with autotvm.apply_graph_best(opt_sch_file):
    with tvm.transform.PassContext(opt_level=3):
        exe = relay.build(mod, target, params=params)
        gr = debug_executor.create(exe.get_graph_json(), exe.lib, dev)
        report = gr.profile(data=data)
        print(report)
2 Likes

Can you post the printouts you get from each of these. It will make it easier to debug what you are seeing.

In general, the difference between (1) and (2) is that you are running on a different executor. The debug_executor is a simple graph executor that cannot handle recursive programs. The profiler_vm is a virtual machine that can handle recursive programs, but may be slower than a graph executor. You should use the profiler that corresponds with the executor you plan to use.

1 Like

@tkonolige Thank you for responding. I just want to find out the amount of time spent on data layout transformations while running inference on ResNet-50. profiler_vm seems to report a much lower inference cost (1) than debug_executor (2). Does this not contradict your statement that profiler_vm may be slower than graph executor? Also I ran benchmarking via tvm.contrib.graph_executor:

with autotvm.apply_graph_best(opt_sch_file):
    with tvm.transform.PassContext(opt_level=3):
                lib = relay.build_module.build(mod, target=target, params=params)
                # runtime is tvm.contrib.graph_executor
                module = runtime.GraphModule(lib["default"](dev))
                module.set_input("data", data)
                print("Evaluate inference time cost...")
                print(module.benchmark(dev, func_name="main", number=100, repeat=3, end_to_end=True))

The inference costs I get via this (3) is always close but lower than (1). Do you have any idea why this is so?

The Outputs: (1) [profiler_vm]

Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                    Duration (us)  Percent   layout  Count  out_layout  Device  data_layout  kernel_layout              Hash                                                                                                                                                       Argument Shapes  src_layout  dst_layout  weight_layout  
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                38,648.93    14.19               5     NCHW16c    cpu0      NCHW64c     OIHW64i16o  5c16c122a657ba21                                                         float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                 31,069.39    11.41               4      NCHW8c    cpu0      NCHW16c      OIHW16i8o  f2c6de1cbe5c0ddb                                                            float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1, 8], float32[1, 16, 28, 28, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                23,726.42     8.71               3      NCHW8c    cpu0       NCHW2c       OIHW2i8o  cb108aaf00eff9e2                                                              float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                18,153.16     6.66               5      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  e4cba4831bd46d2c                                                        float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 15,697.88     5.76               2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b2d690588ecaac96                                                            float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_3                         14,098.72     5.18               4     NCHW16c    cpu0      NCHW16c     OIHW16i16o  84bec82add215ebe                                                     float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 10,840.88     3.98               3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  d930aa7bf46c34e1                                                          float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_1                         10,638.57     3.91               3     NCHW16c    cpu0       NCHW8c      OIHW8i16o  6beba43d92784786                                                       float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    8,112.57     2.98               1      NCHW8c    cpu0       NCHW3c       OIHW3i8o  2f8575d36cac57f0                                                             float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 112, 112, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  7,847.28     2.88               1      NCHW8c    cpu0      NCHW16c      OIHW16i8o  7baee5c8a4d8e4ab                                                          float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  7,684.11     2.82               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  25fd1c3d9d4e561e                                                            float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add                            7,625.64     2.80               2     NCHW32c    cpu0      NCHW16c     OIHW16i32o  667036afd5deee1b                                                          float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                  7,622.32     2.80               2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6e49d3c836077ac7                                                            float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_2                              7,530.83     2.76               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b6e66601adaeb1e3                                                                                 float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_4                          7,305.51     2.68               2     NCHW16c    cpu0       NCHW4c      OIHW4i16o  d0d1536228842867                                                        float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_3                              7,303.69     2.68               1      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  493c374dd5e37c2b                                                                                 float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8], float32[1, 256, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14                 7,199.44     2.64               2      NCHW8c    cpu0    NCHW2048c    OIHW2048i8o  af5e7bf563de2757                                                            float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_1                              7,185.16     2.64               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  5e7a95757d65e24e                                                                                   float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                3,905.42     1.43               1     NCHW32c    cpu0      NCHW16c     OIHW16i32o  18ea4e7c768c292e                                 float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc                                3,776.76     1.39               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  7ff40af88acd710e                                                                                       float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       3,693.25     1.36               1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              3,616.06     1.33               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  faa415ce8e443d42                             float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_2                          3,601.05     1.32               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  c3c48546ccd1c8e4                                                       float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              3,509.62     1.29               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  237b36f60eadc660                           float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 2,119.10     0.78               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                         float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                  1,969.95     0.72               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8ec1781e87f7f62e                                                       float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  1,869.16     0.69               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  39975a03990f0ed6                                                            float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                    920.43     0.34               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  ce29dd2da9289ac4                                                              float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
fused_add_nn_relu_layout_transform                             814.00     0.30               5                cpu0                              7590737f314ee1d9                                                                                     float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_nn_relu                                              751.40     0.28               2                cpu0                              f6724216088f2bf7                                                                                         float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_dense_pack_add                                658.90     0.24               1                cpu0                              ced18cccebfa2ada                                                                                           float32[1, 2048], float32[125, 2048, 8], float32[1, 1000], float32[1, 1000]                                   NC8n  
fused_add_nn_relu_1                                            624.30     0.23               3                cpu0                              848825acfc73218b                                                                                      float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_max_pool2d_add_nn_relu                                378.72     0.14   NCHW8c      1                cpu0                              4883943910905d24                                                                                          float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 56, 56, 8]                                         
fused_layout_transform                                         173.49     0.06               5                cpu0                              0693edb3d97dc77f                                                                                                                  float32[1, 32, 14, 14, 8], float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
fused_add_nn_relu_layout_transform_1                           172.54     0.06               2                cpu0                              468080b095af509a                                                                                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
fused_layout_transform_3                                       138.92     0.05               1                cpu0                              6dda5720a553f260                                                                                                               float32[1, 64, 14, 14, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_layout_transform                                      90.72     0.03               1                cpu0                              69355d3cc810f874                                                                                                 float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
fused_nn_global_avg_pool2d                                      83.09     0.03  NCHW16c      1                cpu0                              f18307e2786f4cb3                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
fused_layout_transform_4                                        79.73     0.03               1                cpu0                              aad3e266e27c5054                                                                                                                   float32[1, 256, 7, 7, 8], float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
fused_layout_transform_2                                        50.75     0.02               3                cpu0                              bd0b0c2ae84f7e09                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
fused_layout_transform_5                                        39.88     0.01               2                cpu0                              69f132fa7e1d6749                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
fused_layout_transform_1                                        14.62     0.01               1                cpu0                              9bd937910d443787                                                                                                                    float32[1, 32, 7, 7, 16], float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
fused_nn_softmax                                                 7.80     0.00               1                cpu0                              ca61e79ea24e53f0                                                                                                                                    float32[1, 1000], float32[1, 1000]                                         
fused_layout_transform_nn_batch_flatten                          1.41     0.00               1                cpu0                              2db99463d18696a4                                                                                                                           float32[1, 128, 1, 1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                                                                                                                                                                                                                                                                                                                     
Sum                                                       2,71,351.60    99.61              84                                                                                                                                                                                                                                                                                 
Total                                                     2,72,418.15                        1                cpu0                                                                                                                                                                                                                                                             

(2): [debug_executor]

Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                                   Duration (us)  Percent   layout  Count  out_layout  Device  data_layout  kernel_layout              Hash                                                                                                                                                       Argument Shapes  src_layout  dst_layout  weight_layout  
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14              5,68,263.92    48.76               1      NCHW8c    cpu0       NCHW3c       OIHW3i8o  2f8575d36cac57f0                                                             float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 112, 112, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2           1,75,988.78    15.10               1     NCHW32c    cpu0      NCHW16c     OIHW16i32o  18ea4e7c768c292e                                 float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                82,241.79     7.06               2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b2d690588ecaac96                                                            float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc                               67,905.70     5.83               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  7ff40af88acd710e                                                                                       float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                 39,639.91     3.40               5     NCHW16c    cpu0      NCHW64c     OIHW64i16o  5c16c122a657ba21                                                         float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 31,242.14     2.68               4      NCHW8c    cpu0      NCHW16c      OIHW16i8o  f2c6de1cbe5c0ddb                                                            float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1, 8], float32[1, 16, 28, 28, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_4                         29,317.11     2.52               2     NCHW32c    cpu0      NCHW16c     OIHW16i32o  667036afd5deee1b                                                          float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu                   23,174.76     1.99               3      NCHW8c    cpu0       NCHW2c       OIHW2i8o  cb108aaf00eff9e2                                                              float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 18,815.10     1.61               5      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  e4cba4831bd46d2c                                                        float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1                         14,143.22     1.21               4     NCHW16c    cpu0      NCHW16c     OIHW16i16o  84bec82add215ebe                                                     float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                 10,807.49     0.93               3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  d930aa7bf46c34e1                                                          float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3                         10,635.87     0.91               3     NCHW16c    cpu0       NCHW8c      OIHW8i16o  6beba43d92784786                                                       float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                 8,887.56     0.76               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  ce29dd2da9289ac4                                                              float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                 8,865.53     0.76               2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6e49d3c836077ac7                                                            float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  8,017.28     0.69               1      NCHW8c    cpu0      NCHW16c      OIHW16i8o  7baee5c8a4d8e4ab                                                          float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 7,585.56     0.65               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  25fd1c3d9d4e561e                                                            float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2                              7,442.40     0.64               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b6e66601adaeb1e3                                                                                 float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                  7,293.48     0.63               2      NCHW8c    cpu0    NCHW2048c    OIHW2048i8o  af5e7bf563de2757                                                            float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3                              7,140.03     0.61               1      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  493c374dd5e37c2b                                                                                 float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8], float32[1, 256, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1                              7,041.60     0.60               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  5e7a95757d65e24e                                                                                   float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add                            6,836.18     0.59               2     NCHW16c    cpu0       NCHW4c      OIHW4i16o  d0d1536228842867                                                        float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2                          3,727.41     0.32               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  c3c48546ccd1c8e4                                                       float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              3,596.31     0.31               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  faa415ce8e443d42                             float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       3,468.59     0.30               1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                3,440.23     0.30               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  237b36f60eadc660                           float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  3,144.19     0.27               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  39975a03990f0ed6                                                            float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  1,997.84     0.17               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                         float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                  1,783.56     0.15               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8ec1781e87f7f62e                                                       float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_add_nn_relu_1                                            473.00     0.04               2                cpu0                              f6724216088f2bf7                                                                                         float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_add_nn_relu                                              338.92     0.03               3                cpu0                              848825acfc73218b                                                                                      float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_add_nn_relu_layout_transform_1                           286.62     0.02               5                cpu0                              7590737f314ee1d9                                                                                     float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_nn_contrib_dense_pack_add                                265.74     0.02               1                cpu0                              ced18cccebfa2ada                                                                                           float32[1, 2048], float32[125, 2048, 8], float32[1, 1000], float32[1, 1000]                                   NC8n  
tvmgen_default_fused_nn_max_pool2d_add_nn_relu                                251.56     0.02   NCHW8c      1                cpu0                              4883943910905d24                                                                                          float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 56, 56, 8]                                         
tvmgen_default_fused_layout_transform_3                                       132.62     0.01               5                cpu0                              0693edb3d97dc77f                                                                                                                  float32[1, 32, 14, 14, 8], float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
tvmgen_default_fused_nn_global_avg_pool2d                                      69.42     0.01  NCHW16c      1                cpu0                              f18307e2786f4cb3                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
tvmgen_default_fused_layout_transform_4                                        60.92     0.01               1                cpu0                              aad3e266e27c5054                                                                                                                   float32[1, 256, 7, 7, 8], float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
tvmgen_default_fused_layout_transform_5                                        58.94     0.01               1                cpu0                              6dda5720a553f260                                                                                                               float32[1, 64, 14, 14, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_add_layout_transform                                      56.01     0.00               1                cpu0                              69355d3cc810f874                                                                                                 float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
tvmgen_default_fused_add_nn_relu_layout_transform                              54.40     0.00               2                cpu0                              468080b095af509a                                                                                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
tvmgen_default_fused_layout_transform_1                                        42.67     0.00               2                cpu0                              69f132fa7e1d6749                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
tvmgen_default_fused_layout_transform                                          33.90     0.00               3                cpu0                              bd0b0c2ae84f7e09                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
tvmgen_default_fused_layout_transform_2                                        19.34     0.00               1                cpu0                              9bd937910d443787                                                                                                                    float32[1, 32, 7, 7, 16], float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
tvmgen_default_fused_nn_softmax                                                 7.03     0.00               1                cpu0                              ca61e79ea24e53f0                                                                                                                                    float32[1, 1000], float32[1, 1000]                                         
tvmgen_default_fused_layout_transform_nn_batch_flatten                          0.96     0.00               1                cpu0                              2db99463d18696a4                                                                                                                           float32[1, 128, 1, 1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                                                                                                                                                                                                                                                                                                                                    
Sum                                                                     11,64,595.59    99.94              84                                                                                                                                                                                                                                                                                 
Total                                                                   11,65,326.68                        1                cpu0                                                                                  

(3) [benchmark]

Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  269.9458     270.0297     270.0697     269.7381      0.1478   

I’m surprised you are seeing such a large difference. Can you try running the profiler multiple times (in the same script) and see if the results are consistent.

I have run both profilers multiple times. The vm_profiler’s inference times are consistently 270-272 ms, while that of the debug_executor’s is within 800ms - 1.2s.

Here is the whole code just in case:

import numpy as np
import pytest
from io import StringIO
import csv
import os
import json
import sys

import tvm
import tvm.testing
from tvm.runtime import profiler_vm
from tvm import relay, autotvm
from tvm.relay.testing import mlp
from tvm.contrib.debugger import debug_executor
import tvm.contrib.graph_executor as runtime
from tvm import rpc
from tvm.contrib import utils
from tvm.runtime.profiling import Report
from tvm.autotvm.tuner import RandomTuner
from tvm.autotvm.graph_tuner import DPTuner

@tvm.testing.parametrize_targets
def test_resnet(target, dev, mode, n_layer):
    batch_size = 1
    input_shape = (batch_size, 3, 224, 224)
    dtype = "float32"
    modelname = "resnet-" + str(n_layer)

    log = "results/%s/%s.log" % (modelname, modelname)
    opt = "results/%s/%s_graph_opt.log" % (modelname, modelname)
    mod, params = relay.testing.resnet.get_workload(
        num_layers=n_layer, batch_size=batch_size, dtype=dtype
    )
    data = np.random.uniform(size=input_shape).astype(dtype)

    with autotvm.apply_graph_best(opt):
        with tvm.transform.PassContext(opt_level=3):
            if mode == "vmprofile":
                exe = relay.vm.compile(mod, target, params=params)
                vm = profiler_vm.VirtualMachineProfiler(exe, dev)
                # report = vm.profile([data], func_name="main", number=100, repeat=3, end_to_end=True)
                report = vm.profile(data=data)
                print(report)
            elif mode == "benchmark":
                lib = relay.build_module.build(mod, target=target, params=params)
                module = runtime.GraphModule(lib["default"](dev))
                module.set_input("data", data)
                print("Evaluate inference time cost...")
                print(module.benchmark(dev, number=100, repeat=3)) #, end_to_end=True))
            elif mode == "grprofile":
                exe = relay.build(mod, target, params=params)
                gr = debug_executor.create(exe.get_graph_json(), exe.lib, dev)
                report = gr.profile(data=data)
                print(report)

if __name__ == "__main__":
    layers = 0
    mode = "none"

    if len(sys.argv) != 3:
        print("Usage: python test_runtime_profiling.py [operators] [resnet_layers]")
    else:
        mode = sys.argv[1]
        layers = int(sys.argv[2])

    if (layers == 18 or layers == 50):
        print("Mode: " + mode)
        print("Model: ResNet-" + str(layers))
        test_resnet("llvm", tvm.cpu(), mode, layers)

I cannot reproduce the results you are getting. For me, the graph runtime and the VM are within 10% of each other in profiling. And they are pretty close to the benchmark results too.

Here are some questions that might help you debug this:

  • Have you tried running on a different machine?
  • Have you tried using a target that is specific to your machine? (Something like llvm -mcpu=core-avx2 -model=epyc-7452)
  • Have you tried running without graph tuning?
  • Have you tried different networks?

Hi @tkonolige Sorry for the delay in this response. I modified the target to “llvm -mcpu=cascadelake” according to the target and re-did the tuning. Now I get a much better inference time of < 100ms on benchmark and VirtualMachineProfiler, but a 4x discrepancy still remains between the output of the two profilers. The outputs are attached below ([1]). I tried ResNet-18 as well, but I am observing the same discrepancy there as well.

On running without graph tuning, I am observing almost no discrepancy. Interestingly the debug_executor’s total inference time worsens when I enable graph tuning, while that of the other two improves. The outputs are attached below ([2]). I haven’t yet been able to get hold of another system to install and run these experiments on, I will update this thread as soon as that happens.


Outputs: [1] With Graph Tuning (a) profiler_vm

Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                    Duration (us)  Percent  Count  out_layout  Device  data_layout  kernel_layout              Hash   layout                                                                                                                                                       Argument Shapes  dst_layout  weight_layout  src_layout  
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                 15,909.52    16.00      6     NCHW16c    cpu0      NCHW16c     OIHW16i16o  efb9044cdd43e0b8                                                                float32[1, 16, 14, 14, 16], float32[16, 16, 3, 3, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                 10,522.82    10.58      4     NCHW32c    cpu0       NCHW8c      OIHW8i32o  0d551fd3800939e1                                                                     float32[1, 16, 28, 28, 8], float32[4, 16, 3, 3, 8, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 28, 28, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                 9,095.54     9.15      3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  68695c5cd347ce57                                                                    float32[1, 32, 7, 7, 16], float32[32, 32, 3, 3, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  8,034.25     8.08      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  83e0f5d1673ff2ae                                                                     float32[1, 1, 56, 56, 64], float32[2, 1, 3, 3, 64, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  6,451.60     6.49      5     NCHW16c    cpu0      NCHW16c     OIHW16i16o  c8d2fb74508242fa                                                                float32[1, 64, 14, 14, 16], float32[16, 64, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_2                          6,219.45     6.25      5     NCHW16c    cpu0       NCHW4c      OIHW4i16o  991e77362efe315d                                                                float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_1                          4,069.38     4.09      3     NCHW64c    cpu0      NCHW32c     OIHW32i64o  b8f45dade76ef8ee                                                                   float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 28, 28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                  3,627.03     3.65      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  435cfe42fcb8d0b0                                                                     float32[1, 8, 28, 28, 64], float32[4, 8, 1, 1, 64, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 28, 28, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add                            3,069.46     3.09      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6fb734c77ed64bde                                                                float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    2,898.89     2.92      1     NCHW16c    cpu0       NCHW3c      OIHW3i16o  10a40e9231ff15a6                                                                   float32[1, 1, 224, 224, 3], float32[4, 1, 7, 7, 3, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 112, 112, 16]                                         
fused_nn_contrib_conv2d_NCHWc_3                              2,659.84     2.67      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  9c3ea371f8ec4054                                                                                          float32[1, 64, 14, 14, 16], float32[128, 64, 1, 1, 16, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 2,592.41     2.61      2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  1cc8a4dccc794a64                                                                  float32[1, 128, 7, 7, 16], float32[32, 128, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_3                          2,587.97     2.60      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  528b9cb523882d7e                                                                 float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_1                              2,568.47     2.58      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  9b9c1d5fc56b0353                                                                                            float32[1, 16, 56, 56, 16], float32[8, 16, 1, 1, 16, 64], float32[1, 8, 28, 28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_2                              2,560.30     2.57      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  371a9e61ecaeecce                                                                                            float32[1, 8, 28, 28, 64], float32[64, 8, 1, 1, 64, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                  2,393.13     2.41      2     NCHW64c    cpu0      NCHW16c     OIHW16i64o  850ecaa157c95aac                                                                   float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 56, 56, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                1,519.12     1.53      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  abe40a1f08b34bad                                      float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc                                1,382.10     1.39      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  7661eb48c0b8a7e6                                                                                            float32[1, 4, 56, 56, 16], float32[16, 4, 1, 1, 16, 16], float32[1, 16, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              1,319.25     1.33      1     NCHW64c    cpu0      NCHW32c     OIHW32i64o  88bbb32f8f542f98                                          float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 28, 64]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              1,299.49     1.31      1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  c7b912640028a9e2                                      float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       1,252.95     1.26      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  21cb6d538731ba92           float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
fused_add_nn_relu                                              823.04     0.83      2                cpu0                              e907ce81104cda7a                                                                                               float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                   759.67     0.76      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                                  float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_dense_pack_add                                710.21     0.71      1                cpu0                              7641a0cce9852143                                                                                                    float32[1, 2048], float32[40, 2048, 25], float32[1, 1000], float32[1, 1000]                      NC25n              
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                    693.32     0.70      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  dc31662fedbb8185                                                                  float32[1, 8, 28, 28, 64], float32[16, 8, 1, 1, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                    656.53     0.66      1    NCHW128c    cpu0      NCHW16c    OIHW16i128o  9b01f6479b89fd68                                                                float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 128], float32[1, 1, 1, 1, 128], float32[1, 1, 28, 28, 128]                                         
fused_add_nn_relu_1                                            631.05     0.63      3                cpu0                              0e82013d73aa68c1                                                                                                  float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 28, 64]                                         
fused_add_nn_relu_2                                            542.71     0.55      5                cpu0                              f12067172f61c850                                                                                               float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_max_pool2d_add_nn_relu                                364.59     0.37      1                cpu0                              6f701a4fa071030f  NCHW16c                                                                                       float32[1, 4, 112, 112, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                    330.20     0.33      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  0f7bbb0e363c360c                                                                     float32[1, 4, 56, 56, 16], float32[1, 4, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 56, 56, 64]                                         
fused_layout_transform_1                                       188.19     0.19      3                cpu0                              b8cbb72b4035894d                                                                                                                           float32[1, 4, 28, 28, 32], float32[1, 16, 28, 28, 8]      NCHW8c                    NCHW32c  
fused_layout_transform_2                                       172.13     0.17      6                cpu0                              f5e631fb93d23d4d                                                                                                                          float32[1, 16, 14, 14, 16], float32[1, 64, 14, 14, 4]      NCHW4c                    NCHW16c  
fused_add_nn_relu_3                                            106.41     0.11      2                cpu0                              5d16c15878cc73d4                                                                                                float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
fused_add_layout_transform                                      96.21     0.10      1                cpu0                              69355d3cc810f874                                                                                                          float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]      NCHW3c                       NCHW  
fused_nn_global_avg_pool2d                                      56.33     0.06      1                cpu0                              f18307e2786f4cb3  NCHW16c                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
fused_layout_transform                                          52.16     0.05      1                cpu0                              2c5d64d5f9faa001                                                                                                                          float32[1, 1, 28, 28, 128], float32[1, 16, 28, 28, 8]      NCHW8c                   NCHW128c  
fused_layout_transform_3                                        48.26     0.05      3                cpu0                              add43c0d2d8a8a3c                                                                                                                             float32[1, 32, 7, 7, 16], float32[1, 16, 7, 7, 32]     NCHW32c                    NCHW16c  
fused_nn_softmax                                                 9.76     0.01      1                cpu0                              ca61e79ea24e53f0                                                                                                                                             float32[1, 1000], float32[1, 1000]                                         
fused_layout_transform_nn_batch_flatten                          1.73     0.00      1                cpu0                              2db99463d18696a4                                                                                                                                    float32[1, 128, 1, 1, 16], float32[1, 2048]        NCHW                    NCHW16c  
----------                                                                                                                                                                                                                                                                                                                                                                     
Sum                                                         98,275.48    98.83     84                                                                                                                                                                                                                                                                                          
Total                                                       99,441.43               1                cpu0                                                                                                          

(b) debug_executor

Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                                   Duration (us)  Percent  Count  out_layout  Device  data_layout  kernel_layout              Hash   layout                                                                                                                                                       Argument Shapes  dst_layout  weight_layout  src_layout  
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3                       1,39,559.24    36.43      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6fb734c77ed64bde                                                                float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12              1,18,024.98    30.81      1     NCHW16c    cpu0       NCHW3c      OIHW3i16o  10a40e9231ff15a6                                                                   float32[1, 1, 224, 224, 3], float32[4, 1, 7, 7, 3, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 112, 112, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc                               23,051.66     6.02      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  7661eb48c0b8a7e6                                                                                            float32[1, 4, 56, 56, 16], float32[16, 4, 1, 1, 16, 16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                 15,185.61     3.96      6     NCHW16c    cpu0      NCHW16c     OIHW16i16o  efb9044cdd43e0b8                                                                float32[1, 16, 14, 14, 16], float32[16, 16, 3, 3, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                 13,328.36     3.48      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  83e0f5d1673ff2ae                                                                     float32[1, 1, 56, 56, 64], float32[2, 1, 3, 3, 64, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                13,159.49     3.44      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  0f7bbb0e363c360c                                                                     float32[1, 4, 56, 56, 16], float32[1, 4, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 56, 56, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                 10,205.32     2.66      4     NCHW32c    cpu0       NCHW8c      OIHW8i32o  0d551fd3800939e1                                                                     float32[1, 16, 28, 28, 8], float32[4, 16, 3, 3, 8, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 28, 28, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    7,727.92     2.02      3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  68695c5cd347ce57                                                                    float32[1, 32, 7, 7, 16], float32[32, 32, 3, 3, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1                          5,840.79     1.52      5     NCHW16c    cpu0       NCHW4c      OIHW4i16o  991e77362efe315d                                                                float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                  5,746.35     1.50      5     NCHW16c    cpu0      NCHW16c     OIHW16i16o  c8d2fb74508242fa                                                                float32[1, 64, 14, 14, 16], float32[16, 64, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2                          3,745.35     0.98      3     NCHW64c    cpu0      NCHW32c     OIHW32i64o  b8f45dade76ef8ee                                                                   float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                  3,425.00     0.89      3     NCHW32c    cpu0      NCHW64c     OIHW64i32o  435cfe42fcb8d0b0                                                                     float32[1, 8, 28, 28, 64], float32[4, 8, 1, 1, 64, 32], float32[1, 4, 1, 1, 32], float32[1, 4, 28, 28, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2                              2,508.48     0.65      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  371a9e61ecaeecce                                                                                            float32[1, 8, 28, 28, 64], float32[64, 8, 1, 1, 64, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                 2,400.83     0.63      2     NCHW64c    cpu0      NCHW16c     OIHW16i64o  850ecaa157c95aac                                                                   float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 64], float32[1, 1, 1, 1, 64], float32[1, 1, 56, 56, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1                              2,396.47     0.63      1     NCHW64c    cpu0      NCHW16c     OIHW16i64o  9b9c1d5fc56b0353                                                                                            float32[1, 16, 56, 56, 16], float32[8, 16, 1, 1, 16, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                  2,271.00     0.59      2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  1cc8a4dccc794a64                                                                  float32[1, 128, 7, 7, 16], float32[32, 128, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add                            2,260.06     0.59      2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  528b9cb523882d7e                                                                 float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3                              2,240.88     0.59      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  9c3ea371f8ec4054                                                                                          float32[1, 64, 14, 14, 16], float32[128, 64, 1, 1, 16, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              1,401.25     0.37      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  abe40a1f08b34bad                                      float32[1, 2, 56, 56, 32], float32[16, 2, 1, 1, 32, 16], float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              1,249.88     0.33      1     NCHW64c    cpu0      NCHW32c     OIHW32i64o  88bbb32f8f542f98                                          float32[1, 4, 28, 28, 32], float32[8, 4, 1, 1, 32, 64], float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       1,220.86     0.32      1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  21cb6d538731ba92           float32[1, 16, 7, 7, 32], float32[128, 16, 1, 1, 32, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                1,160.16     0.30      1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  c7b912640028a9e2                                      float32[1, 64, 14, 14, 4], float32[64, 64, 1, 1, 4, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                    599.86     0.16      1    NCHW128c    cpu0      NCHW16c    OIHW16i128o  9b01f6479b89fd68                                                                float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16, 128], float32[1, 1, 1, 1, 128], float32[1, 1, 28, 28, 128]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                    579.10     0.15      1     NCHW16c    cpu0      NCHW64c     OIHW64i16o  dc31662fedbb8185                                                                  float32[1, 8, 28, 28, 64], float32[16, 8, 1, 1, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                    571.03     0.15      1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                                  float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_add_nn_relu_3                                            519.35     0.14      2                cpu0                              e907ce81104cda7a                                                                                               float32[1, 16, 56, 56, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_dense_pack_add                                488.16     0.13      1                cpu0                              7641a0cce9852143                                                                                                    float32[1, 2048], float32[40, 2048, 25], float32[1, 1000], float32[1, 1000]                      NC25n              
tvmgen_default_fused_add_nn_relu_2                                            360.30     0.09      3                cpu0                              0e82013d73aa68c1                                                                                                  float32[1, 8, 28, 28, 64], float32[1, 8, 1, 1, 64], float32[1, 8, 28, 28, 64]                                         
tvmgen_default_fused_nn_max_pool2d_add_nn_relu                                342.65     0.09      1                cpu0                              6f701a4fa071030f  NCHW16c                                                                                       float32[1, 4, 112, 112, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_add_nn_relu_1                                            291.22     0.08      5                cpu0                              f12067172f61c850                                                                                               float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_layout_transform_2                                       106.25     0.03      3                cpu0                              b8cbb72b4035894d                                                                                                                           float32[1, 4, 28, 28, 32], float32[1, 16, 28, 28, 8]      NCHW8c                    NCHW32c  
tvmgen_default_fused_add_layout_transform                                      76.45     0.02      1                cpu0                              69355d3cc810f874                                                                                                          float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]      NCHW3c                       NCHW  
tvmgen_default_fused_layout_transform_1                                        68.45     0.02      6                cpu0                              f5e631fb93d23d4d                                                                                                                          float32[1, 16, 14, 14, 16], float32[1, 64, 14, 14, 4]      NCHW4c                    NCHW16c  
tvmgen_default_fused_add_nn_relu                                               51.61     0.01      2                cpu0                              5d16c15878cc73d4                                                                                                float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_global_avg_pool2d                                      46.06     0.01      1                cpu0                              f18307e2786f4cb3  NCHW16c                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
tvmgen_default_fused_layout_transform_3                                        36.66     0.01      1                cpu0                              2c5d64d5f9faa001                                                                                                                          float32[1, 1, 28, 28, 128], float32[1, 16, 28, 28, 8]      NCHW8c                   NCHW128c  
tvmgen_default_fused_layout_transform                                          11.41     0.00      3                cpu0                              add43c0d2d8a8a3c                                                                                                                             float32[1, 32, 7, 7, 16], float32[1, 16, 7, 7, 32]     NCHW32c                    NCHW16c  
tvmgen_default_fused_nn_softmax                                                 9.50     0.00      1                cpu0                              ca61e79ea24e53f0                                                                                                                                             float32[1, 1000], float32[1, 1000]                                         
tvmgen_default_fused_layout_transform_nn_batch_flatten                          1.05     0.00      1                cpu0                              2db99463d18696a4                                                                                                                                    float32[1, 128, 1, 1, 16], float32[1, 2048]        NCHW                    NCHW16c  
----------                                                                                                                                                                                                                                                                                                                                                                                    
Sum                                                                      3,82,269.07    99.80     84                                                                                                                                                                                                                                                                                          
Total                                                                    3,83,036.30               1                cpu0                                                                                           

(c) benchmark

Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0 -mcpu=cascadelake, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  95.1157      95.0706      95.2259      95.0505       0.0784   

[2] Without graph tuning (a) profiler_vm

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                        Duration (us)  Percent                                                                                                                                  Argument Shapes  layout              Hash  data_layout  out_layout  Device  kernel_layout  Count  
fused_nn_conv2d_multiply_add_nn_relu_8          18,312.26    13.48                                float32[1, 256, 14, 14], float32[256, 256, 3, 3], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          2cdb64071c823e24         NCHW                cpu0           OIHW      6  
fused_nn_conv2d_multiply_add_nn_relu_11         15,536.25    11.44                                    float32[1, 512, 7, 7], float32[512, 512, 3, 3], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          ac5db8098bc41e35         NCHW                cpu0           OIHW      3  
fused_nn_conv2d_multiply_add_nn_relu_5          11,510.85     8.47                                float32[1, 128, 28, 28], float32[128, 128, 3, 3], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          1e2717b5beb2fa67         NCHW                cpu0           OIHW      4  
fused_nn_conv2d_multiply_add_nn_relu_9           8,864.91     6.53                              float32[1, 1024, 14, 14], float32[256, 1024, 1, 1], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          d13eb3b00a8d5f35         NCHW                cpu0           OIHW      5  
fused_nn_conv2d_multiply_add_nn_relu_2           8,638.60     6.36                                      float32[1, 64, 56, 56], float32[64, 64, 3, 3], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          9e6b01a1c3c8a068         NCHW                cpu0           OIHW      3  
fused_nn_conv2d_add_2                            8,204.04     6.04                                            float32[1, 256, 14, 14], float32[1024, 256, 1, 1], float32[1, 1024, 14, 14], float32[1, 1024, 14, 14]          5220d30314ead0f1         NCHW                cpu0           OIHW      5  
fused_nn_conv2d_add_1                            5,331.53     3.92                                               float32[1, 128, 28, 28], float32[512, 128, 1, 1], float32[1, 512, 28, 28], float32[1, 512, 28, 28]          0bf103915aebe126         NCHW                cpu0           OIHW      3  
fused_nn_conv2d_3                                5,018.85     3.69                                                                      float32[1, 1024, 14, 14], float32[2048, 1024, 1, 1], float32[1, 2048, 7, 7]          6d6eb730bfedd923         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_multiply_add_nn_relu_6           4,782.81     3.52                                float32[1, 512, 28, 28], float32[128, 512, 1, 1], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          4e037f410da9f71b         NCHW                cpu0           OIHW      3  
fused_nn_conv2d_add                              4,655.16     3.43                                                 float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56], float32[1, 256, 56, 56]          0579ca31a5deb349         NCHW                cpu0           OIHW      2  
fused_nn_conv2d_multiply_add_nn_relu_12          4,458.45     3.28                                  float32[1, 2048, 7, 7], float32[512, 2048, 1, 1], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          910c4036cd67e89a         NCHW                cpu0           OIHW      2  
fused_nn_conv2d_add_3                            4,426.89     3.26                                                  float32[1, 512, 7, 7], float32[2048, 512, 1, 1], float32[1, 2048, 7, 7], float32[1, 2048, 7, 7]          49d9928cdfecc5cf         NCHW                cpu0           OIHW      2  
fused_nn_conv2d_multiply_add_nn_relu_3           4,271.83     3.14                                    float32[1, 256, 56, 56], float32[64, 256, 1, 1], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          811cb902928c44b7         NCHW                cpu0           OIHW      2  
fused_nn_conv2d_1                                3,825.73     2.82                                                                        float32[1, 256, 56, 56], float32[512, 256, 1, 1], float32[1, 512, 28, 28]          1a74ea9d21fe242b         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_multiply_add_nn_relu             3,396.77     2.50                                    float32[1, 3, 224, 224], float32[64, 3, 7, 7], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 112, 112]          dbd095d1f70608d4         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_2                                3,289.00     2.42                                                                      float32[1, 512, 28, 28], float32[1024, 512, 1, 1], float32[1, 1024, 14, 14]          d06d46290b62d7fe         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_add_multiply_add_nn_relu         2,393.57     1.76         float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 56, 56]          ac4d220a67fcb1ee         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_add_multiply_add_nn_relu_3       2,082.21     1.53        float32[1, 512, 7, 7], float32[2048, 512, 1, 1], float32[1, 2048, 7, 7], float32[2048, 1, 1], float32[2048, 1, 1], float32[1, 2048, 7, 7]          a44fe9b9e8d2da07         NCHW                cpu0           OIHW      1  
fused_nn_dense_nn_bias_add                       2,008.92     1.48                                                                           float32[1, 2048], float32[1000, 2048], float32[1000], float32[1, 1000]          d7434de44c54529a                             cpu0                     1  
fused_nn_conv2d                                  1,972.36     1.45                                                                          float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56]          336879824a51f323         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_add_multiply_add_nn_relu_1       1,782.06     1.31       float32[1, 128, 28, 28], float32[512, 128, 1, 1], float32[1, 512, 28, 28], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 28, 28]          94d9f51ec760c01b         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_multiply_add_nn_relu_4           1,631.71     1.20                                float32[1, 256, 56, 56], float32[128, 256, 1, 1], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          85514dd90f0099e4         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_add_multiply_add_nn_relu_2       1,623.09     1.19  float32[1, 256, 14, 14], float32[1024, 256, 1, 1], float32[1, 1024, 14, 14], float32[1024, 1, 1], float32[1024, 1, 1], float32[1, 1024, 14, 14]          ff2ba4afc4b00ccf         NCHW                cpu0           OIHW      1  
fused_multiply_add_nn_relu                       1,390.58     1.02                                                         float32[1, 256, 56, 56], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 56, 56]          9a48c23d6d41bd2f                             cpu0                     2  
fused_nn_conv2d_multiply_add_nn_relu_10          1,231.09     0.91                                float32[1, 1024, 14, 14], float32[512, 1024, 1, 1], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          5506cbf207ccd131         NCHW                cpu0           OIHW      1  
fused_nn_conv2d_multiply_add_nn_relu_7             982.67     0.72                                float32[1, 512, 28, 28], float32[256, 512, 1, 1], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          341159a3c2f00cae         NCHW                cpu0           OIHW      1  
fused_multiply_add_nn_relu_1                       873.20     0.64                                                         float32[1, 512, 28, 28], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 28, 28]          004373fd83ff4e02                             cpu0                     3  
fused_nn_max_pool2d_multiply_add_nn_relu           870.70     0.64                                                           float32[1, 64, 112, 112], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]    NCHW  c55e51cdf27573bb                             cpu0                     1  
fused_multiply_add_nn_relu_2                       554.39     0.41                                                     float32[1, 1024, 14, 14], float32[1024, 1, 1], float32[1024, 1, 1], float32[1, 1024, 14, 14]          9eb2396efef0e312                             cpu0                     5  
fused_nn_conv2d_multiply_add_nn_relu_1             530.49     0.39                                      float32[1, 64, 56, 56], float32[64, 64, 1, 1], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          482f6fc9ff278c3f         NCHW                cpu0           OIHW      1  
fused_multiply_add_nn_relu_3                       109.80     0.08                                                         float32[1, 2048, 7, 7], float32[2048, 1, 1], float32[2048, 1, 1], float32[1, 2048, 7, 7]          b74dcecdab14a995                             cpu0                     2  
fused_multiply_add                                  97.47     0.07                                                             float32[1, 3, 224, 224], float32[3, 1, 1], float32[3, 1, 1], float32[1, 3, 224, 224]          65ced11c4ebbde8f                             cpu0                     1  
fused_nn_global_avg_pool2d                          54.84     0.04                                                                                                   float32[1, 2048, 7, 7], float32[1, 2048, 1, 1]    NCHW  9589c5c75edc8cf4                             cpu0                     1  
fused_nn_softmax                                    10.39     0.01                                                                                                               float32[1, 1000], float32[1, 1000]          ca61e79ea24e53f0                             cpu0                     1  
fused_nn_batch_flatten                               1.89     0.00                                                                                                         float32[1, 2048, 1, 1], float32[1, 2048]          8af63b18f42fefd8                             cpu0                     1  
----------                                                                                                                                                                                                                                                                                            
Sum                                           1,34,725.32    99.16                                                                                                                                                                                                                                71  
Total                                         1,35,860.04                                                                                                                                                                                                                 cpu0                     1  

(b) debug_executor

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                       Duration (us)  Percent                                                                                                                                  Argument Shapes  layout              Hash  data_layout  out_layout  Device  kernel_layout  Count  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_3          17,370.19    12.63                                float32[1, 256, 14, 14], float32[256, 256, 3, 3], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          2cdb64071c823e24         NCHW                cpu0           OIHW      6  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu            11,964.35     8.70                                    float32[1, 512, 7, 7], float32[512, 512, 3, 3], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          ac5db8098bc41e35         NCHW                cpu0           OIHW      3  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_11         11,740.92     8.54                                      float32[1, 64, 56, 56], float32[64, 64, 1, 1], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          482f6fc9ff278c3f         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_6          11,105.73     8.08                                float32[1, 128, 28, 28], float32[128, 128, 3, 3], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          1e2717b5beb2fa67         NCHW                cpu0           OIHW      4  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_9           8,865.77     6.45                                      float32[1, 64, 56, 56], float32[64, 64, 3, 3], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          9e6b01a1c3c8a068         NCHW                cpu0           OIHW      3  
tvmgen_default_fused_nn_conv2d_add_multiply_add_nn_relu_3       8,005.04     5.82         float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 56, 56]          ac4d220a67fcb1ee         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_add_1                            7,359.88     5.35                                            float32[1, 256, 14, 14], float32[1024, 256, 1, 1], float32[1, 1024, 14, 14], float32[1, 1024, 14, 14]          5220d30314ead0f1         NCHW                cpu0           OIHW      5  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_4           7,026.80     5.11                              float32[1, 1024, 14, 14], float32[256, 1024, 1, 1], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          d13eb3b00a8d5f35         NCHW                cpu0           OIHW      5  
tvmgen_default_fused_nn_conv2d_add_2                            4,732.52     3.44                                               float32[1, 128, 28, 28], float32[512, 128, 1, 1], float32[1, 512, 28, 28], float32[1, 512, 28, 28]          0bf103915aebe126         NCHW                cpu0           OIHW      3  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_7           4,503.68     3.28                                float32[1, 512, 28, 28], float32[128, 512, 1, 1], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          4e037f410da9f71b         NCHW                cpu0           OIHW      3  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_10          4,475.41     3.25                                    float32[1, 256, 56, 56], float32[64, 256, 1, 1], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]          811cb902928c44b7         NCHW                cpu0           OIHW      2  
tvmgen_default_fused_nn_conv2d_add_3                            4,407.82     3.21                                                 float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56], float32[1, 256, 56, 56]          0579ca31a5deb349         NCHW                cpu0           OIHW      2  
tvmgen_default_fused_nn_conv2d_3                                3,409.06     2.48                                                                      float32[1, 1024, 14, 14], float32[2048, 1024, 1, 1], float32[1, 2048, 7, 7]          6d6eb730bfedd923         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_add                              3,405.21     2.48                                                  float32[1, 512, 7, 7], float32[2048, 512, 1, 1], float32[1, 2048, 7, 7], float32[1, 2048, 7, 7]          49d9928cdfecc5cf         NCHW                cpu0           OIHW      2  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_1           3,395.20     2.47                                  float32[1, 2048, 7, 7], float32[512, 2048, 1, 1], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          910c4036cd67e89a         NCHW                cpu0           OIHW      2  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_12          3,365.71     2.45                                    float32[1, 3, 224, 224], float32[64, 3, 7, 7], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 112, 112]          dbd095d1f70608d4         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_1                                3,357.74     2.44                                                                        float32[1, 256, 56, 56], float32[512, 256, 1, 1], float32[1, 512, 28, 28]          1a74ea9d21fe242b         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_2                                2,957.75     2.15                                                                      float32[1, 512, 28, 28], float32[1024, 512, 1, 1], float32[1, 1024, 14, 14]          d06d46290b62d7fe         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d                                  1,823.12     1.33                                                                          float32[1, 64, 56, 56], float32[256, 64, 1, 1], float32[1, 256, 56, 56]          336879824a51f323         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_add_multiply_add_nn_relu         1,656.22     1.20        float32[1, 512, 7, 7], float32[2048, 512, 1, 1], float32[1, 2048, 7, 7], float32[2048, 1, 1], float32[2048, 1, 1], float32[1, 2048, 7, 7]          a44fe9b9e8d2da07         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_add_multiply_add_nn_relu_2       1,604.78     1.17       float32[1, 128, 28, 28], float32[512, 128, 1, 1], float32[1, 512, 28, 28], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 28, 28]          94d9f51ec760c01b         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_dense_nn_bias_add                       1,567.51     1.14                                                                           float32[1, 2048], float32[1000, 2048], float32[1000], float32[1, 1000]          d7434de44c54529a                             cpu0                     1  
tvmgen_default_fused_nn_conv2d_add_multiply_add_nn_relu_1       1,559.91     1.13  float32[1, 256, 14, 14], float32[1024, 256, 1, 1], float32[1, 1024, 14, 14], float32[1024, 1, 1], float32[1024, 1, 1], float32[1, 1024, 14, 14]          ff2ba4afc4b00ccf         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_8           1,444.25     1.05                                float32[1, 256, 56, 56], float32[128, 256, 1, 1], float32[128, 1, 1], float32[128, 1, 1], float32[1, 128, 28, 28]          85514dd90f0099e4         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_multiply_add_nn_relu_3                     1,433.78     1.04                                                         float32[1, 256, 56, 56], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 56, 56]          9a48c23d6d41bd2f                             cpu0                     2  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_2             869.02     0.63                                float32[1, 1024, 14, 14], float32[512, 1024, 1, 1], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 7, 7]          5506cbf207ccd131         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_nn_max_pool2d_multiply_add_nn_relu           835.63     0.61                                                           float32[1, 64, 112, 112], float32[64, 1, 1], float32[64, 1, 1], float32[1, 64, 56, 56]    NCHW  c55e51cdf27573bb                             cpu0                     1  
tvmgen_default_fused_multiply_add_nn_relu_2                       786.12     0.57                                                         float32[1, 512, 28, 28], float32[512, 1, 1], float32[512, 1, 1], float32[1, 512, 28, 28]          004373fd83ff4e02                             cpu0                     3  
tvmgen_default_fused_nn_conv2d_multiply_add_nn_relu_5             784.15     0.57                                float32[1, 512, 28, 28], float32[256, 512, 1, 1], float32[256, 1, 1], float32[256, 1, 1], float32[1, 256, 14, 14]          341159a3c2f00cae         NCHW                cpu0           OIHW      1  
tvmgen_default_fused_multiply_add_nn_relu_1                       428.09     0.31                                                     float32[1, 1024, 14, 14], float32[1024, 1, 1], float32[1024, 1, 1], float32[1, 1024, 14, 14]          9eb2396efef0e312                             cpu0                     5  
tvmgen_default_fused_multiply_add                                 334.36     0.24                                                             float32[1, 3, 224, 224], float32[3, 1, 1], float32[3, 1, 1], float32[1, 3, 224, 224]          65ced11c4ebbde8f                             cpu0                     1  
tvmgen_default_fused_multiply_add_nn_relu                          91.58     0.07                                                         float32[1, 2048, 7, 7], float32[2048, 1, 1], float32[2048, 1, 1], float32[1, 2048, 7, 7]          b74dcecdab14a995                             cpu0                     2  
tvmgen_default_fused_nn_global_avg_pool2d                          39.10     0.03                                                                                                   float32[1, 2048, 7, 7], float32[1, 2048, 1, 1]    NCHW  9589c5c75edc8cf4                             cpu0                     1  
tvmgen_default_fused_nn_softmax                                     9.62     0.01                                                                                                               float32[1, 1000], float32[1, 1000]          ca61e79ea24e53f0                             cpu0                     1  
tvmgen_default_fused_nn_batch_flatten                               1.53     0.00                                                                                                         float32[1, 2048, 1, 1], float32[1, 2048]          8af63b18f42fefd8                             cpu0                     1  
----------                                                                                                                                                                                                                                                                                                           
Sum                                                          1,36,717.56    99.42                                                                                                                                                                                                                                71  
Total                                                        1,37,508.67                                                                                                                                                                                                                 cpu0                     1  

(c) benchmark

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  131.7054     131.8304     131.8502     131.4355      0.1910