为什么transformer的embedding和位置编码后都要有一个pos_drop?self.pos_drop = nn.Dropout(p=drop_rate)
x = self.embedding(x)x += self.pos_embedx = self.pos_drop(x)