It seems to me that if we are talking about just one op (i.e., depthwise conv2d) implemented in C++, then it’s much easier to directly integrate to TOPI and become an extern op just like other kernels in CuBLAS.
It seems to me that if we are talking about just one op (i.e., depthwise conv2d) implemented in C++, then it’s much easier to directly integrate to TOPI and become an extern op just like other kernels in CuBLAS.