Maybe the slowdown is due to int16 fallback? Or, since you modified the compute, the “right” schedule may not be getting called.
Maybe the slowdown is due to int16 fallback? Or, since you modified the compute, the “right” schedule may not be getting called.