Thanks! Sorry for the late reply. I haven’t found the workload that slow down the current kernel yet:). What’s more, I notice that AutoTIR has been merged to main recently. Since I haven’t found the doc for it, I’m curious about its function. And to be specific, is there any optimization for the hand-written kernels?