Actually there is not enough parallelism in the computation of each cell. You can see “ Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks” for more details.
Actually there is not enough parallelism in the computation of each cell. You can see “ Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks” for more details.