Batch Normalization论文中第四页
3 Normalization via Mini-Batch Statistics
but it should be noted that the BN transform does not independently process the activation in each training example. Rather, BNγ,β (x) depends both on the training example and the other examples in the mini-batch.
这里除了训练样本还有小批量中的其他样本该怎么理解?transformation中 x 不是只包括minibatch一个batch的样本吗?
懂了,training example就是小批量里的一个,这里作者的意思是bn不仅仅靠小批量里的一个样本。