Grouped convolution performance penalty

Do you solve the problem. I got the same problem that when I set group =2 for group convolution, the speed is much slower then group = 1