I try to use metaschedule to tune an operator with cuda-tensorcore, and compare with ansor
def meta_opt(self): conv = topi.nn.conv2d_nchw(self.data, self.kernel, 1, 3, 1) print(conv)
func = te.create_prim_func([self.data, self.kernel, conv])
ir_module = IRModule({"main": func})
database = ms.tune_tir(ir_module, "nvidia/geforce-rtx-3090", max_trials_global=1500, work_dir="./tune_tmp", task_name="main",
sch_rules="cuda-tensorcore", postprocs="cuda-tensorcore", mutator_probs="cuda-tensorcore"))
sch = ms.tir_integration.compile_tir(database, ir_module, "nvidia/geforce-rtx-3090")
mod = tvm.build(sch.mod, target="cuda")
a_np = np.random.randint(0, 255, size=(1,3,224,224)).astype(self.dtype)
b_np = np.random.uniform(size=(64,3,7,7)).astype(self.dtype)
a_nd = tvm.nd.array(a_np, self.dev)
b_nd = tvm.nd.array(b_np, self.dev)
c_nd = tvm.nd.empty((1,64,224,224), dtype=self.dtype, device=self.dev)
f_timer_after = mod.time_evaluator("main", self.dev)
print("Time cost of MyModule after meta tuning: %.3f ms" % (f_timer_after(a_nd, b_nd, c_nd).mean * 1000))
But the tuning result has almost no advantage with the ansor result
Time cost of MyModule after meta tuning: 0.047 ms
Time cost of MyModule after ansor tuning: 0.051 ms
I want to know if I really use the cuda-tensorcore to tune my operator, and why the result doesn’t get significant improvement