tensorflow1.X 版本的bert,做相似度二分类怎么样可以返回0,1值,使它直观的看到啊?
调用的函数太多了,改了好几个地方都报错。还是说没有办法?
在TensorFlow1.x版本的BERT中,通常使用tf.nn.softmax()函数来输出概率值,然后根据阈值将概率值转化为0或1。
假设您已经训练好了BERT模型,并且对测试集进行了预测,得到了相似度的概率值,您可以使用以下代码将概率值转化为0或1:
import numpy as np
# 将概率值转化为0或1
def convert_to_binary(probability, threshold):
binary = np.zeros_like(probability)
binary[probability >= threshold] = 1
return binary
# 设置阈值
threshold = 0.5
# 获取相似度的概率值
probabilities = model.predict(x_test)
# 将概率值转化为0或1
predictions = convert_to_binary(probabilities, threshold)
其中,probabilities是BERT模型预测得到的相似度的概率值,threshold是设置的阈值,一般设置为0.5。convert_to_binary()函数将概率值转化为0或1,大于等于阈值的概率值为1,小于阈值的概率值为0。
这样,predictions就是最终的0或1值,可以直观地看到模型对测试集的预测结果了。
不知道你这个问题是否已经解决, 如果还没有解决的话:源码中给出了两个任务的finetune例子,一个是句子对分类任务MRPC,另一个是问答任务squad。下面分别进行剖析:
主要还是看模型和loss的构建:
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
labels, num_labels, use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
output_layer = model.get_pooled_output()
hidden_size = output_layer.shape[-1].value
output_weights = tf.get_variable(
"output_weights", [num_labels, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"output_bias", [num_labels], initializer=tf.zeros_initializer())
with tf.variable_scope("loss"):
if is_training:
# I.e., 0.1 dropout
output_layer = tf.nn.dropout(output_layer, keep_prob=0.9)
logits = tf.matmul(output_layer, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
probabilities = tf.nn.softmax(logits, axis=-1)
log_probs = tf.nn.log_softmax(logits, axis=-1)
one_hot_labels = tf.one_hot(labels, depth=num_labels, dtype=tf.float32)
per_example_loss = -tf.reduce_sum(one_hot_labels * log_probs, axis=-1)
loss = tf.reduce_mean(per_example_loss)
return (loss, per_example_loss, logits, probabilities)
这里其实就是用了之前的BERT模型,然后取出其第一个token的表示,而后建立一个合适类别的分类器。
同样看模型和loss的构建:
(start_logits, end_logits) = create_model(
bert_config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
segment_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
def compute_loss(logits, positions):
one_hot_positions = tf.one_hot(
positions, depth=seq_length, dtype=tf.float32)
log_probs = tf.nn.log_softmax(logits, axis=-1)
loss = -tf.reduce_mean(
tf.reduce_sum(one_hot_positions * log_probs, axis=-1))
return loss
start_positions = features["start_positions"]
end_positions = features["end_positions"]
start_loss = compute_loss(start_logits, start_positions)
end_loss = compute_loss(end_logits, end_positions)
total_loss = (start_loss + end_loss) / 2.0
这里首先是create_model
,得到在各个位置上的属于start以及end的logits
。而后通过compute_loss
计算loss,通过对所有位置进行softmax,根据目标位置计算交叉熵。最后的loss是两者的平均。其中create_model
的代码如下:
def create_model(bert_config, is_training, input_ids, input_mask, segment_ids,
use_one_hot_embeddings):
"""Creates a classification model."""
model = modeling.BertModel(
config=bert_config,
is_training=is_training,
input_ids=input_ids,
input_mask=input_mask,
token_type_ids=segment_ids,
use_one_hot_embeddings=use_one_hot_embeddings)
final_hidden = model.get_sequence_output()
final_hidden_shape = modeling.get_shape_list(final_hidden, expected_rank=3)
batch_size = final_hidden_shape[0]
seq_length = final_hidden_shape[1]
hidden_size = final_hidden_shape[2]
output_weights = tf.get_variable(
"cls/squad/output_weights", [2, hidden_size],
initializer=tf.truncated_normal_initializer(stddev=0.02))
output_bias = tf.get_variable(
"cls/squad/output_bias", [2], initializer=tf.zeros_initializer())
final_hidden_matrix = tf.reshape(final_hidden,
[batch_size * seq_length, hidden_size])
logits = tf.matmul(final_hidden_matrix, output_weights, transpose_b=True)
logits = tf.nn.bias_add(logits, output_bias)
logits = tf.reshape(logits, [batch_size, seq_length, 2])
logits = tf.transpose(logits, [2, 0, 1])
unstacked_logits = tf.unstack(logits, axis=0)
(start_logits, end_logits) = (unstacked_logits[0], unstacked_logits[1])
return (start_logits, end_logits)
可见是用一个linear层,分别计算是start和end的分数。