Sure. My model is Swin Transformer. Open source code
Swin Transformer Model.
Here are Auto-Scheduler Tuning Code.
import torch
from swin_transformer import SwinTransformer
import tvm
from tvm import relay, auto_scheduler
with torch.no_grad():
model = SwinTransformer(img_size=224, in_chans=3, embed_dim=96, depths=[2, 3, 6, 2],
num_heads=[3, 6, 12, 24], window_size=7, drop_path_rate=0.2,num_classes = 13).float().cuda().eval()
shape = [192, 3, 224, 224]
input0 = torch.ones(shape).float().cuda()
trace = torch.jit.trace(model,input0)
torch.jit.save(trace,'st_v1.trace')
relay_model, params = relay.frontend.from_pytorch(trace, [('input0',input0.shape)], default_dtype='float32')
target = tvm.target.cuda()
tasks, task_weights = auto_scheduler.extract_tasks(relay_model["main"], params, target)
measure_ctx = auto_scheduler.LocalRPCMeasureContext(repeat=1,
min_repeat_ms=100,
timeout=100)
tuner = auto_scheduler.TaskScheduler(tasks,
task_weights,
load_model_file='st_v1',
)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=36000,
num_measures_per_round=64,
early_stopping=500,
verbose=True,
runner=measure_ctx.runner,
measure_callbacks=[auto_scheduler.RecordToFile('st_v1.log')],
)
tuner.tune(tune_option)
with auto_scheduler.ApplyHistoryBest("st_v1.log"):
with tvm.transform.PassContext(opt_level=3, config={"relay.backend.use_auto_scheduler": True}):
lib = relay.build(relay_model, target=target, params=params)
lib.export_library('st_v1.so')
Well, I found that the model file built from relay is still 3.2 GB without tuning process, in contrast with torch trace model 110MB. I guess there’s problem during task extracting.