How to combine byoc tensorrt and ansor tune to build model

Python3 · June 3, 2021, 5:09am

Hi friends: I have a model which is not suitable for using TensorRT, so I am considering using both tensorrt and ansor tune to build the model. The model has some deformanle comv2d op, but this kind of op doesn’t have any improvements using byoc tensorrt. Thus, I used ansor tune this kind of model. I am wondering whether there is some way to build a relay model using both tensorrt and ansor backend based on different subgraphs? Thank you!

comaniac · June 3, 2021, 4:24pm

Just simply follow TensorRT tutorial to partition the model, and add the tuning before building it. The flow looks like:

Use partition for TensorRT to get the partitioned mod
Tune the partitioned mod using Ansor. Ansor should only extract the workloads that won’t be offloaded to TensorRT.
Build the mod with both TensorRT configuration and tuning logs.

Python3 · June 3, 2021, 5:46pm

Thank you, I will try it !

Python3 · June 3, 2021, 8:49pm

I will do like the following, is it correct?

        # step 1: partition graph
        mod, config = partition_for_tensorrt(mod, params, remove_no_mac_subgraphs=True)


        # step2: ansor tune:
        tasks, task_weights = auto_scheduler.extract_tasks(mod["main"], params, target)
        for idx, task in enumerate(tasks):
            print("========== Task %d  (workload key: %s) ==========" % (idx, task.workload_key))
            print(task.compute_dag)
        print("Begin tuning...")
        measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1, min_repeat_ms=300, timeout=10)

        tuner = auto_scheduler.TaskScheduler(tasks, task_weights)
        tune_option = auto_scheduler.TuningOptions(
            num_measure_trials=2000,  # change this to 20000 to achieve the best performance
            runner=measure_ctx.runner,
            measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
        )

        tuner.tune(tune_option) 


        # step3: combine tensorrt and ansor tune to build model
        print("compiling ... ")
        with auto_scheduler.ApplyHistoryBest(log_file):
            with tvm.transform.PassContext(opt_level=3, config={'relay.ext.tensorrt.options': config, "relay.backend.use_auto_scheduler": True}):
                lib= relay.build(mod, target=target, params=params)
        lib.export_library('deploy.so')

comaniac · June 3, 2021, 9:23pm

Look good to me at a glance.

twmht · August 30, 2023, 5:31am

@comaniac

Do you have any idea about integrating cudunn in the seach space of ansor? Not just offload the operator to cudnn, does ansor can search the scheduler of cudnn?

If, for example, the scheduler results obtained from TVM’s search are not better than cuDNN, you can use cuDNN’s scheduler for implementation.