Why iOS ARM64 CPU Opt level 0 is faster than Opt level 3?

I am developing a TVM iOS application and I noticed in iOS CPU optimization level 3 is taking more inference time than level 0 it is

  • CPU Opt level 0: ~261ms
  • CPU Opt level 3: ~759ms

But for the Metal target, it is reaching my expectation

  • Metal Opt level 0: 436ms
  • Metal Opt level 3: 1.307ms

What will be the issue with the CPU?

same in cpu :frowning: so anyone can explain it ?

Since you didn’t post the string that describes your target (e.g. llvm -mcpu=xxx other flags) I can only offer a wild guess. Under some circumstances I have found that forcing a particular layout (e.g. NHCW vs NCHW) and data types (int 8 vs int16) you will get a worse schedule which results in longer latencies. So choosing a particular optimization level may force the compiler to generate a worse schedule?

Choosing metal is more restrictive and thus guides the scheduling at compile time which gives better latencies. Again, it’s a wild guess and I could be completely wrong.

Hi @alopez_13 thank you for your response below are the target details.

For iOS CPU.

  • sdk = “iphoneos”
  • target = “llvm -mtriple=arm64-apple-darwin”

For iOS Metal:

  • sdk = “iphoneos”
  • target = “metal”
  • target_host = “llvm -mtriple=arm64-apple-darwin”

and this model is float32 NHWC