We want to add cce target in tvm, it would take time for open souring the whole cce backend, can we PR cce target related first? we need to add a device_type in dlpack, also add target name and device_type in c_runtime_api.cc build_module.cc runtime_ctypes.py.
Given the current status of CCE support, maybe it makes sense to bring kDLCCE to tvm repo first, with some background info, once we have some running examples, then we upstream the change to DLPack
Itâs great to enable the programming model for the new AI chip. I think the community would take time to get familiar with the new era of ASIC based accelerators. @xqdan can you provide more information on the following details regarding to CCE C programming and the DaVinci chip?
DaVinci chip (Computational Capacity, Availability of Development Boards)
Without those specs, it wouldnât be friendly for the community to accept the new AI chip. There are already many AI chips that donât have any developer friendly programming interface.
@liangfu thanks for your attention, actually what weâve been doing on TVM is trying to reduce developersâ burden of learning these detailed low level information. Imagine that you just write the tvm dsl and no need to take care of the things you mentioned above.
@xqdan One thing that might be nice is to understand the set of hardware intrinsics that TVM should lower a schedule down to (for instance are we using tensorization intrinsics, or different types of DMA load/stores). In addition, it might be good to understand how a programmer can expose more parallelism for the chip to take advantage of. For instance with the VTA reference design we used virtual threads that would be lowered to low-level dataflow-like synchronization operations to uncover task-level parallelism within the chip.
Highlighting these challenges when targeting the DaVinci chip would be nice, and perhaps contrasting it with VTA so that programmers can understand how it relates in terms of challenges.