Resnet ToMixedPrecision tuning error

A_newer · November 28, 2022, 11:55am

I add ToMixedPrecision in tune_network_mali.py to tune a fp32 network in fp16. But error occured on resnet-50:

File “/home/tvm/tvm/python/tvm/auto_scheduler/measure.py”, line 1150, in _rpc_run func.entry_func(*loc_args)
File “/home/tvm/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 237, in call rai
…
execution of TVM.
For more information, please see: Handle TVM Errors — tvm 0.11.dev0 documentation
Check failed: ret == 0 (-1 vs. 0) : TVMError: Cannot handle float16 as device function argument , all_cost:2.11, Tstamp:1669419374.61)

Besides that, the layer named vm_mod_fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 has no performance static. In the schedule table, it’s like

| 5 | vm_mod_fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_1 | - | - | 64 |
| 6 | vm_mod_fused_nn_contrib_conv2d_winograd_without_weight_transform_add_nn_relu_2 | - | - | 64 |
| 7 | vm_mod_fused_nn_conv2d_add_nn_relu_7 | 0.825 | 62.32 | 64 |

As you can see, the 7th layer conv2d has it’s Laytency and Speed data, but not for 5th and 6th layer.

AndrewZhaoLuo · December 1, 2022, 6:27pm

The error I believe occurs when making packed funcs containing built kernels into CUDA calling convention. The issue is we don’t support packing FP16 types. This patch should fix it (please give it a try): Resnet ToMixedPrecision tuning error

This seems to only occur when using TIR’s CSE pass for what it’s worth.

A_newer · December 2, 2022, 1:58am

Thanks, but it seems that you give me a wrong link to the patch

AndrewZhaoLuo · December 2, 2022, 6:09pm

Oops, try this: https://github.com/apache/tvm/pull/13532

We might need to change the approach according to comments, though won’t have time for this until next week.

A_newer · December 5, 2022, 4:18am

It works, I will make a complete tuning process with 25000 trails and reply soon.

twmht · July 28, 2023, 5:45am

@AndrewZhaoLuo

I may have similar issue ( [MetaScheduler] Can’t optimize resnet50 with tensorcore - Questions - Apache TVM Discuss) when using meta scheduler? do you have any idea?

Krishna · November 30, 2023, 9:05am

Hi all , I tried executing @twmht’s script for a resnet50 tflite model and I get the following error :

I traced the error to its source and it seems to be from the ToMixedPrecision() Pass. Has something to do with float16 handling by the metascheduler. Can you confirm if you have faced an error similar to this? I also see this issue here but it has been stagnant for a while now.

Can you please let me know if you were able to find a workaround for this? TIA and have a nice day! @AndrewZhaoLuo @A_newer Regards,

Krishna