Arm compute library segv with inception-v1, squeezenet

Even the underlying implementation executes the graph op-by-op, it is still beneficial to merge subgraphs to reduce kernel launching and data transfer overheads. In-subgraph tensors can also be totally managed by ACL instead of graph runtime.