Now, I think there are four ways to port tvm to a new AI accelerator.
1 : BYOC, BYOC can offload the ops to your new device which your new device support. BYOC is simple and graceful, But we can’t use AutoTVM in BYOC. I think AutoTVM is the very import feature of TVM.
2 : Tensorize, By using TVM’s schedule primitive Tensorize, we can replace a unit of computation with the corresponding intrinsic, such as GEMM instruction. We can use AutoTVM in this way, but we may need to use tensorize to modify very ops’s schedule.
3 : like cuDNN, we can use tvm to call new device like use cuDNN to call GPU. this way is not better than BYOC
4 : like GPU/CPU, we can add a new target in tvm like GPU/CPU, we need develop compute and schedule for every op, we also need to develop graph optimize for this new device. we can use AutoTVM in this way. But this way is the most time-consuming and the most difficult
I think if we only have the op level’s api of new device, BYOC is the best way.
If we have ISA level’s interface of new device, which way is the best?