Hi, I want to use CUTLASS contrib in my model to utilize tensor core in fp16, also want to keep op fusion and automatic tuning of ansor, which performs well on my model. But it seems that there is no example for combining ansor and cutlass contrib. So I did some experimenting.
Here is my pipeline:
- Call
relay.op.contrib.cutlass.partition_for_cutlass
, to match and replace the functions supported by cutlass. - Call
relay.auto_scheduler.relay_integration.extract_tasks
, to extract ansor tasks. - Tune ansor tasks.
- Call
relay.build
withauto_scheduler.ApplyHistoryBest
, to compile model by ansor logs. - Call
finalize_modules
to get updated library.
During this process, I encountered a problem: the extract_tasks
function only supports one target, but we need to pass in all targets including cuda and cutlass, to call call_all_topi_funcs
. So I made a small modification to the extract_tasks
function.
I have verified the performance benefits of this process on our business model, which can not only use cutlass to call tensorcore, but also retain the advantages of ansor in complex models. And I will try to verify it on some open source models.
Do you think we can organize this pipeline into a test script and submit it to the community? Looking forward to reply. Thanks!