我是这方面的初学者,run了一下弹出了以下内容
Traceback (most recent call last):
ValueError: Found input variables with inconsistent numbers of samples: [500, 100]
数据总共501行9列
我的代码如下
import matplotlib as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import SGDRegressor
from sklearn.metrics import mean_absolute_error,mean_squared_error,median_absolute_error,r2_score,log_loss
#数据准备
data=np.loadtxt("unit7-exe.csv",delimiter=",",skiprows=1,dtype=float)
X=data[:,1:8] #特征集 二维矩阵
y=data[:,8] #目标(标签label)集
y=y.reshape(-1,1) #将Y转为1列二维矩阵,行数由列数自动计算
#转化数据集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
#计算矩阵每列的平均值和方差
standscaler=StandardScaler()
standscaler.fit(X_train)
#根据矩阵方差将矩阵标准化
X_train_standard=standscaler.transform(X_train)
X_test_standard=standscaler.transform(X_test)
print(X_train_standard)
print(X_test_standard)
#用Sgd算法更新参数
sgd=SGDRegressor() #生成一个sdg对象
sgd.fit(X_train_standard,y_train)
print(sgd.coef_)
print(sgd.intercept_)
# 利用学得的模型进行预测
y_predict=sgd.predict(X_test_standard)
#评价
print("mae",mean_absolute_error(y,y_predict))
print("mse", mean_squared_error(y, y_predict))
print("median-ae", median_absolute_error(y, y_predict))
print("r2", r2_score(y, y_predict))
loss = log_loss(X_test, y_test)
plt.plot(np.arange(1000) + 1, loss)
plt.xlabel('Iterations')
plt.ylabel('Loss')
plt.title('SDG梯度下降模型')
plt.show()
特征样本数和目标变量样本数不匹配
检查数据集的大小,并确保两个数据集具有相同数量的样本,你可以先输出下样本长度看看