Model File is extremely large tuning through TVM Auto-Scheduler

zyzhang · June 13, 2022, 12:10pm

I tuned a original pytorch trace model and it’s 120 MB. And the tvm model file tuning through auto_scheduler.TaskScheduler is 3.2GB and has only 2ms acceleration. I wonder if this is common?

My model is fp32 and device Nvidia 2070. Thank you for any ideas.

jwfromm · June 14, 2022, 4:28pm

This is a little confusing. TVM in general shouldn’t be changing the size of models and the tuning process especially doesn’t have an impact as it just produces a log file used in compilation. Do you have any python snippets that show how you’re producing this much larger model?

zyzhang · June 15, 2022, 11:33am

Sure. My model is Swin Transformer. Open source code Swin Transformer Model. Here are Auto-Scheduler Tuning Code.

import torch
from swin_transformer import SwinTransformer
import tvm
from tvm import relay, auto_scheduler

with torch.no_grad():
    model = SwinTransformer(img_size=224, in_chans=3, embed_dim=96, depths=[2, 3, 6, 2],
            num_heads=[3, 6, 12, 24], window_size=7, drop_path_rate=0.2,num_classes = 13).float().cuda().eval()
    shape = [192, 3, 224, 224]
    input0 = torch.ones(shape).float().cuda()
    trace = torch.jit.trace(model,input0)
    torch.jit.save(trace,'st_v1.trace')
    relay_model, params = relay.frontend.from_pytorch(trace, [('input0',input0.shape)], default_dtype='float32')
    target = tvm.target.cuda()
    tasks, task_weights = auto_scheduler.extract_tasks(relay_model["main"], params, target)
    measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1,
                                                        min_repeat_ms=100,
                                                        timeout=100)

    tuner = auto_scheduler.TaskScheduler(tasks,
                                         task_weights,
                                         load_model_file='st_v1',
                                        )
    tune_option = auto_scheduler.TuningOptions(
        num_measure_trials=36000,
        num_measures_per_round=64,
        early_stopping=500,
        verbose=True,
        runner=measure_ctx.runner,
        measure_callbacks=[auto_scheduler.RecordToFile('st_v1.log')],
    )
    tuner.tune(tune_option)
    with auto_scheduler.ApplyHistoryBest("st_v1.log"):
        with tvm.transform.PassContext(opt_level=3, config={"relay.backend.use_auto_scheduler": True}):
            lib = relay.build(relay_model, target=target, params=params)

    lib.export_library('st_v1.so')

Well, I found that the model file built from relay is still 3.2 GB without tuning process, in contrast with torch trace model 110MB. I guess there’s problem during task extracting.