Why iOS ARM64 CPU Opt level 0 is faster than Opt level 3?

myproject24 · August 12, 2021, 1:17pm

I am developing a TVM iOS application and I noticed in iOS CPU optimization level 3 is taking more inference time than level 0 it is

CPU Opt level 0: ~261ms
CPU Opt level 3: ~759ms

But for the Metal target, it is reaching my expectation

Metal Opt level 0: 436ms
Metal Opt level 3: 1.307ms

What will be the issue with the CPU?

ArtyZe · August 14, 2021, 9:58am

same in cpu so anyone can explain it ?

alopez_13 · August 14, 2021, 1:41pm

Since you didn’t post the string that describes your target (e.g. llvm -mcpu=xxx other flags) I can only offer a wild guess. Under some circumstances I have found that forcing a particular layout (e.g. NHCW vs NCHW) and data types (int 8 vs int16) you will get a worse schedule which results in longer latencies. So choosing a particular optimization level may force the compiler to generate a worse schedule?

Choosing metal is more restrictive and thus guides the scheduling at compile time which gives better latencies. Again, it’s a wild guess and I could be completely wrong.

myproject24 · August 16, 2021, 5:19am

Hi @alopez_13 thank you for your response below are the target details.

For iOS CPU.

sdk = “iphoneos”
target = “llvm -mtriple=arm64-apple-darwin”

For iOS Metal:

sdk = “iphoneos”
target = “metal”
target_host = “llvm -mtriple=arm64-apple-darwin”

and this model is float32 NHWC