What's the current status of relay VM support for dynamic shape?

freshbird2023 · January 11, 2023, 4:17am

I read some examples for relay vm, the examples are all about static shape.

I didn’t find a demo using relay VM to tune performance for dynamic-shape models, whether using auto-scheduler or using cudnn or somewhat.

Does Relay VM support the auto tuning for dynamic batch? for example resnet with only dynamic batch?

I tried out relay vm to compile resnet with the batch axis as dynamic, but found the following warning:

One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 3, 224, 224), ‘float32’), (‘TENSOR’, (64, 3, 7, 7), ‘float32’), (2, 2), (3, 3, 3, 3), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (128, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (256, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘conv2d_cudnn.cuda’, (‘TENSOR’, ({any_dim|any_dim>=0}, 512, 7, 7), ‘float32’), (‘TENSOR’, (512, 512, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), 1, ‘NCHW’, ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -keys=cuda,gpu -arch=sm_75 -libs=cudnn -max_num_threads=1024 -thread_warp_size=32, workload=(‘dense_small_batch.gpu’, (‘TENSOR’, ({any_dim|any_dim>=0}, 512), ‘float32’), (‘TENSOR’, (1000, 512), ‘float32’), None, ‘float32’). A fallback configuration is used, which may bring great performance regression.

Need help! Thank you!

zyc-bit · November 1, 2023, 7:49am

It can’t support dynamic shape caused by data value.

ShaobinChen-AH · November 5, 2023, 7:28am

According to this post [Question] How TVM run text generation model like gpt2 - #15 by masahi PT frontend doesn’t support dynamic input shape. But it’s not difficult to add such feature.