深度学习中val_loss突然变得很大,在后续的epoch中又恢复正常,请问这是哪里出了问题,跟我的显卡有关吗?

深度学习,做语义分割segnet网络时,val_loss有时候突然变得很大,acc和val_acc倒是很正常,请问这是怎么一回事呢?keras,RTX3060,跟我的显卡有关吗?

img

img

问题相关代码,

```python
#encoder****************************************************************************************************************
#第1块
img_input = tf.keras.Input(shape=(input_height, input_width, 3))#input_1 (InputLayer) [(None, 256, 256, 3)]
x=tf.keras.layers.Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu')(img_input)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x)

#第2块
x=tf.keras.layers.Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x)

#第3块
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x)

#第4块
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x)

#第5块
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.MaxPooling2D(pool_size=(2,2))(x)

#开始decoder************************************************************************************************************
#第6块
x=tf.keras.layers.UpSampling2D(size=(2,2))(x)
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(1024,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)

#第7块
x=tf.keras.layers.UpSampling2D(size=(2,2))(x)
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(512,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)

#第8块
x=tf.keras.layers.UpSampling2D(size=(2,2))(x)
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(256,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)

#第9块
x=tf.keras.layers.UpSampling2D(size=(2,2))(x)
x=tf.keras.layers.Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(128,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)

#第10块
x=tf.keras.layers.UpSampling2D(size=(2,2))(x)
x=tf.keras.layers.Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
x=tf.keras.layers.Conv2D(64,(3,3),strides=(1,1),padding='same',activation='relu')(x)
x=tf.keras.layers.BatchNormalization()(x)
img_output= tf.keras.layers.Conv2D(5, (3, 3), padding='same',activation='softmax')(x) #分为5类
model=tf.keras.models.Model(inputs=img_input,outputs=img_output)

```请勿粘贴截图

运行结果及报错内容
我的解答思路和尝试过的方法
我想要达到的结果

这和硬件有啥关系,训练的时候没有问题,验证的时候有问题也正常,毕竟你的数据分布又不是完全均衡的,出现一些模拟的不好的图片造成loss增高,但是数据大体上是有一个偏向的,所以慢慢还是会平稳