I read some examples for relay vm, the examples are all about static shape.
I didn’t find a demo using relay VM to tune performance for dynamic-shape models, whether using auto-scheduler or using cudnn or somewhat.
Does Relay VM support the auto tuning for dynamic batch? for example resnet with only dynamic batch?
I tried out relay vm to compile resnet with the batch axis as dynamic, but found the following warning:
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 3, 224, 224), ‘float32’), (‘TENSOR’, (64, 3, 7, 7), ‘float32’), (2, 2), (3, 3, 3, 3), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (128, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (256, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 512, 7, 7), ‘float32’), (‘TENSOR’, (512, 512, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘dense_small_batch.gpu’, (‘TENSOR’, ({any_dim|any_dim>=0}, 512), ‘float32’), (‘TENSOR’, (1000, 512), ‘float32’), None, ‘float32’). A fallback configuration is used, which may bring great performance regression.
Need help! Thank you!