In this RFC, we would like to propose adding a runtime to load and execute CoreML models from TVM.
Motivation
-
Currently, using CoreML is a defacto standard approach to run inference on iOS. This runtime is useful to obtain a baseline benchmark and compare it with TVM.
-
Using CoreML is the only way to leverage Apple’s Neural Engine on iOS devices. With this runtime, we could extract a subgraph and offload it to CoreML to achieve better performance in future.
How this runtime works
Here is a proposal of how to run CoreML inference via TVM RPC. In this workflow, we assume that a RPC proxy is running somewhere and iOS TVM RPC app is set up on the iOS device beforehand.
-
Compile CoreML model
Although there is an API to compile a CoreML model, we don’t use it. It is because the API works only on the device and it makes development hard. We have to use an iOS device (or simulator) even for testing. Instead, we compile a CoreML model with the xcode
coremlc
command. We can completely test CoreML runtime only on macOS with this approach. -
Send a compiled model to the device and launch RPC server
We pack a compiled CoreML model (modelname.mlmodelc) into the iOS TVM RPC app bundle, and then, launch the RPC server and connect to the RPC proxy. These are automated with
xcodebuild test
. -
Create CoreML module and run inference
We create a CoreML module with a packed func (tvm.coreml_runtime.create), and run CoreML inference on iOS via the RPC proxy like other RPC usecases.
Implementation
Implementation is available at https://github.com/apache/incubator-tvm/pull/5283.