Increase in Cache Misses when tuned

I used the PAPI to profile my models and compare tuned v/s untuned model and found that tuned models have higher number of cache misses (almost 3x to 5x) than the untuned model. Could anyone explain the reason for this ?

PS :

  • The rest of the parameter like execution time, stalls, instructions show expected output.
  • I have tried for multiple models and all of them have higher cache misses when tuned.