我用卷积神经网络对数据进行模拟之后让其做出一幅预测与实际的曲线图,我想要对预测与实际曲线之间进行数理统计,得出预测曲线与实际曲线之间的偏差,用方差(包括均值和标准偏差)总体均值和置信区间等作为表现形式,根据这些作为依据。来方便调整程序来让预测曲线更加贴合实际曲线
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Conv1D, MaxPooling1D, Flatten, Dense
1.
#数据清洗
data = pd.read_csv('zhendaorushuju.csv', encoding='gb18030')
2.
#特征选择
features = ['Nb%', 'Ti%', 'AL%', 'V%', 'Cr%', 'Mo%', 'C%', 'Mn%', 'P%', 'Ni%', 'Cu%', 'Si%', 'S%', '奥氏体化温度0℃',
'油冷时间0min', '保温时间0min', '上下窜动时间0min', '回火0退火温度0℃', '保温时间0min']
X = data[features]
y = data['硬度']
3.
#特征缩放
scaler = MinMaxScaler()
X = scaler.fit_transform(X)
4.
#数据划分
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=400)
5.
#重塑数据为三维数组
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
6.
#构建卷积神经网络模型
model = Sequential()
model.add(Conv1D(filters=40, kernel_size=4, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(MaxPooling1D(pool_size=5))
model.add(Flatten())
model.add(Dense(2400, activation='relu'))
model.add(Dense(1))
7.
#编译模型
model.compile(optimizer='adam', loss='mse')
8.
#训练模型
history = model.fit(X_train, y_train, epochs=51, batch_size=30, validation_data=(X_test, y_test), verbose=0)
9.
#评估模型
score = model.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score)
10.
#使用模型进行预测
y_pred = model.predict(X_test)
11.
#绘制预测结果与真实结果的对比图
plt.plot(y_test.values, label='true')
plt.plot(y_pred, label='pred')
plt.legend()
plt.show()
回答引用ChatGPT:以下是如何对预测曲线与实际曲线之间进行数理统计的步骤:
计算预测值与实际值之间的差异,可以使用numpy中的mean()函数计算平均值。
计算预测值与实际值之间的方差,可以使用numpy中的std()函数计算标准偏差,从而得出方差。
计算预测值与实际值之间的置信区间,可以使用scipy中的t分布来计算。t分布是一种概率分布函数,可用于估计均值的区间。需要先计算自由度,自由度为样本数减去1,然后使用t.interval()函数计算置信区间。
以下是对代码的修改,添加数理统计部分的代码:
计算预测值与实际值之间的差异
diff = y_test.values - y_pred.flatten()
mean_diff = np.mean(diff)
计算预测值与实际值之间的方差
std_diff = np.std(diff)
计算预测值与实际值之间的置信区间
import scipy.stats as stats
df = len(diff) - 1 # 自由度
conf_interval = stats.t.interval(0.95, df, loc=mean_diff, scale=stats.sem(diff))
打印出数理统计的结果
print('Mean difference:', mean_diff)
print('Standard deviation of difference:', std_diff)
print('Confidence interval of difference:', conf_interval)
绘制预测结果与真实结果的对比图
plt.plot(y_test.values, label='true')
plt.plot(y_pred, label='pred')
plt.legend()
plt.show()
通过这些统计量,您可以了解预测结果与实际结果之间的偏差大小和分布情况,以及在何种置信水平下预测结果与实际结果之间的差异可能出现。您可以根据这些信息来优化您的模型,使其更准确地预测。
直接上代码:
"""
Note: The code is used to show the change trende via the whole training procession.
First: You need to mark all the loss of every iteration
Second: You need to write these data into a txt file with the format like:
......
iter loss
iter loss
......
Third: the path is the txt file path of your loss
"""
import matplotlib.pyplot as plt
def read_txt(path):
with open(path, 'r') as f:
lines = f.readlines()
splitlines = [x.strip().split(' ') for x in lines]
return splitlines
# Referenced from Tensorboard(a smooth_loss function:https://blog.csdn.net/charel_chen/article/details/80364841)
def smooth_loss(path, weight=0.85):
iter = []
loss = []
data = read_txt(path)
for value in data:
iter.append(int(value[0]))
loss.append(int(float(value[1])))
# Note a str like '3.552' can not be changed to int type directly
# You need to change it to float first, can then you can change the float type ton int type
last = loss[0]
smoothed = []
for point in loss:
smoothed_val = last * weight + (1 - weight) * point
smoothed.append(smoothed_val)
last = smoothed_val
return iter, smoothed
if __name__ == "__main__":
path = './loss.txt'
loss = []
iter = []
iter, loss = smooth_loss(path)
plt.plot(iter, loss, linewidth=2)
plt.title("Loss-iters", fontsize=24)
plt.xlabel("iters", fontsize=14)
plt.ylabel("loss", fontsize=14)
plt.tick_params(axis='both', labelsize=14)
plt.savefig('./loss_func.png')
plt.show()
这里主要借鉴了Tensorboard中的计算方法:tensorboard 平滑损失曲线代码
这里需要普及一下基础知识:当batch_size比较小的时候,损失函数特别波动,此时需要有一种计算方式来削弱这种波动,来显示总体的变化趋势.Tensorboard中采用的算法就是函数smooth_loss所示.
python
# 计算预测结果与真实结果之间的偏差
deviation = y_pred.flatten() - y_test.values
# 计算偏差的均值、标准差、方差
mean_deviation = np.mean(deviation)
std_deviation = np.std(deviation)
variance = np.var(deviation)
# 输出均值、标准差、方差
print("Mean deviation:", mean_deviation)
print("Standard deviation:", std_deviation)
print("Variance:", variance)
python
from scipy import stats
# 计算置信区间
confidence_interval = stats.t.interval(0.95, len(deviation)-1, loc=np.mean(deviation), scale=stats.sem(deviation))
# 输出置信区间
print("Confidence interval:", confidence_interval)
以下答案由GPT-3.5大模型与博主波罗歌共同编写:
您想要对使用卷积神经网络进行预测的结果与实际结果之间的偏差进行数理统计,包括方差(包括均值和标准偏差)、总体均值和置信区间。以下是如何进行这些统计的示例代码:
# 计算均值、标准差、方差
print('真实值均值:', np.mean(y_test))
print('预测值均值:', np.mean(y_pred))
print('真实值标准差:', np.std(y_test))
print('预测值标准差:', np.std(y_pred))
print('真实值方差:', np.var(y_test))
print('预测值方差:', np.var(y_pred))
# 计算总体均值
diff = y_test - y_pred.flatten()
all_mean = diff.mean()
print('总体均值:', all_mean)
# 计算置信区间
t_value = 1.96 # 95%置信区间
n = len(diff)
std_error = diff.std() / np.sqrt(n)
margin_error = t_value * std_error
upper_bound = all_mean + margin_error
lower_bound = all_mean - margin_error
print('置信区间:', lower_bound, upper_bound)
将以上代码片段添加到您的实现中,即可计算统计数据并得出预测结果与实际结果之间的偏差。
如果我的回答解决了您的问题,请采纳!
引用chatGPT作答,要对预测结果与真实结果进行数理统计分析,可以计算它们之间的均值、标准差和方差等指标,以及计算置信区间等。下面是示例代码,您可以根据需要进行修改和调整:
# 计算均值、标准差和方差
mean_true = np.mean(y_test)
mean_pred = np.mean(y_pred)
std_true = np.std(y_test)
std_pred = np.std(y_pred)
var_true = np.var(y_test)
var_pred = np.var(y_pred)
print('True mean:', mean_true)
print('Pred mean:', mean_pred)
print('True std:', std_true)
print('Pred std:', std_pred)
print('True var:', var_true)
print('Pred var:', var_pred)
# 计算置信区间
conf_interval = 1.96 * std_pred / np.sqrt(len(y_test))
lower_bound = mean_pred - conf_interval
upper_bound = mean_pred + conf_interval
print('Confidence interval:', (lower_bound, upper_bound))
另外,您还可以将上述统计指标可视化,以更直观地比较预测结果与真实结果之间的偏差。例如,可以绘制预测结果与真实结果之间的散点图、残差图或误差分布图等。以下是示例代码:
# 绘制散点图
plt.scatter(y_test, y_pred)
plt.xlabel('True values')
plt.ylabel('Predicted values')
plt.show()
# 绘制残差图
residuals = y_test - y_pred.flatten()
plt.scatter(y_test, residuals)
plt.xlabel('True values')
plt.ylabel('Residuals')
plt.axhline(y=0, color='r', linestyle='--')
plt.show()
# 绘制误差分布图
error = y_test - y_pred.flatten()
plt.hist(error, bins=50)
plt.xlabel('Prediction Error')
plt.ylabel('Count')
plt.show()
以上代码中,flatten()函数用于将二维数组转换为一维数组。请根据您的数据和需求进行修改和调整。