In the current implementation, CMSIS-NN support relies on the target hook system. This system supported by the Ahead of Time Executor, while it’s not been used with the Graph Executor. I don’t see any reason CMSIS-NN wouldn’t work with the Graph Executor, it’s just a case of enablement and adding the plumbing.
Currently we rely on CMSIS-NN kernels being greedily partitioned where they are supported. Tuning might be able to make better decisions about what kernel to use, although I’m not aware of a compilation flow that exists today capable of doing that.