我的想法是训练完成后,搜集每个样本的loss,将它们降序排序,选取其中最大的10%。然后挑出这10%loss对应的图片和标签,将这些图片和标签加入数据集。
我是使用callback:
class CustomCallback(keras.callbacks.Callback):
def __init__(self, log_dir, train_lines):
super().__init__()
self.log_dir = log_dir
self.batch_losses = []
self.your_list=[]
self.loss_file = "loss.txt"
self.top20percent_file = "top20percent.txt"
self.train_lines = train_lines
def on_batch_end(self, batch, logs=None):
if logs is not None:
loss = logs.get("loss")
self.batch_losses.append(loss)
def on_epoch_end(self, epoch, logs=None):
if epoch == 499: #判断是否最后一个epoch
all_losses = np.concatenate([np.expand_dims(l, axis=0) for l in self.batch_losses])
print(all_losses)
with open(self.loss_file, "w") as f:
for loss in self.batch_losses:
f.write(str(loss) + '\n')
sorted_losses = sorted(all_losses)
top20_index = int(len(all_losses) * 0.1)
top20_losses = sorted_losses[:top20_index]
top20_image_paths = [self.train_lines[i] for i, loss in enumerate(all_losses) if loss in top20_losses]
print(top20_image_paths)
with open(self.top20percent_file, "w") as f:
for line in top20_image_paths:
f.write(line)
self.batch_losses = [] # 清空本次 epoch 的 loss
但是我这个代码只能获取最后一个epoch的所有batch的loss,我只有把batch_size调整为1才能获取到每个样本图片的loss。但是将batch_size设置为1导致模型训练太慢了。
所以我想请问如何在batch_size不为1时获取到每个样本图片和标签。我是用的是tensorflow1.15。
感谢各位的帮助!
要在TensorFlow 1.x中获取每个样本的损失值,可以使用以下方法:
class CustomCallback(tf.keras.callbacks.Callback):
def __init__(self, log_dir, train_dataset):
super().__init__()
self.log_dir = log_dir
self.loss_file = "loss.txt"
self.top20percent_file = "top20percent.txt"
self.train_dataset = train_dataset
def on_epoch_end(self, epoch, logs=None):
if epoch == 499:
all_losses = []
for x, y in self.train_dataset:
y_pred = self.model.predict(x)
loss = self.model.loss_functions[0](y, y_pred).numpy()
all_losses.append(loss)
with open(self.loss_file, "w") as f:
for loss in all_losses:
f.write(str(loss) + '\n')
sorted_losses = sorted(all_losses)
top20_index = int(len(all_losses) * 0.1)
top20_losses = sorted_losses[:top20_index]
top20_image_paths = []
for i, (x, y) in enumerate(self.train_dataset):
y_pred = self.model.predict(x)
loss = self.model.loss_functions[0](y, y_pred).numpy()
if loss in top20_losses:
top20_image_paths.append(str(i) + ".jpg")
with open(self.top20percent_file, "w") as f:
for line in top20_image_paths:
f.write(line + '\n')
这个callback可以在每个epoch结束时调用,获取所有样本的损失值,然后选取其中最大的10%的样本,将这些样本的图片和标签加入数据集。
你需要将训练数据集作为参数传递给这个callback。在callback中,我们遍历训练数据集中的每个样本,使用 self.model.predict() 方法获取预测值,然后使用损失函数计算该样本的损失值。将所有损失值保存到一个列表中,并按照损失值降序排列。然后选取最大的10%的样本,并将这些样本的图片和标签保存到文件中。