caffe 官方文档定义:
group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the i-th output group channels will be only connected to the i-th input group channels.
Jia Yangqing 论坛回复:
It was there to implement the grouped convolution in Alex Krizhevsky's paper: when group=2, the first half of the filters are only connected to the first half of the input channels, and the second half only connected to the second half.
To group seems like a historical accident from the practical constraints of the time, but I don't know of a side-by-side comparison of group and no-group models. It could be nice to settle the question on the axes of task performance, speed + memory, and optimization time + hassle.
Alex Krizhevsky's paper 中conv的定义:
convolution_param {
num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 1 }
结论:Alex文章中是多个显卡,可能基于并行内存等考虑。所以以后少用。。。
reference: