最近在学习吴喜之的《统计学:从数据到结论》,其中交叉验证部分的python代码运行不出来,求解。
代码如下:
w=pd.read_csv('bidding.csv').iloc[:,3:]
X=w.iloc[:,:-1]
y=w.iloc[:,-1].astype('category')
names=['Bagging','Random Forest','AdaBoost','Logit']
classifiers=[BaggingClassifier(n_estimators=100,random_state=1010),RandomForestClassifier(n_estimators=500,random_state=0),AdaBoostClassifier(n_estimators=100,random_state=0),LogisticRegression(solver='liblinear')]
CLS=dict(zip(names,classifiers))
R,A=ClaCV(X,y,CLS)
print(A)
运行结果及报错内容
NameError: name 'ClaCV' is not defined,
我的解答思路和尝试过的方法
很明显ClaCV这个函数没有被定义,但是该引入的库和模块貌似都已经引入了,前面还有两个例子,分别用了SRCV和SCCV两个函数,也是一样的,做不出交叉验证的结果来。
已经引入的库,如下:
import math,random
import graphviz
import pandas as pd,numpy as np
from sklearn.feature_selection import RFECV
import seaborn as sns
import statsmodels.api as sm
import matplotlib.pyplot as plt
from collections import defaultdict
from IPython.display import SVG
from sklearn import tree
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier,
DecisionTreeRegressor, export_graphviz
from sklearn.ensemble import AdaBoostClassifier,
AdaBoostRegressor,RandomForestClassifier,
RandomForestRegressor, BaggingClassifier, BaggingRegressor
from sklearn.svm import SVC,SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import cross_val_score
import statsmodels as tsm
import matplotlib as tplt
import sklearn as tsk
我想要达到的结果
请告知为何会出现这个情况,是因为没有引入什么库吗?
)
题主这本书里的代码看起来是有问题的,sklearn里没有CLaCV这种函数,我想是不是cross_val_score
?
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
iris = load_iris()
logreg = LogisticRegression()
scores = cross_val_score(logreg, iris.data, iris.target)
print("Cross-validation scores: {}".format(scores))
另外,SRCV和SCCV根本没听过,大概率是他自己写得函数,里面的逻辑我大概猜测,前一步已经将分类器定义好,通过zip拉链与dict转换成了我一个分类器对应一个字段形式,函数里需要解包并将数据丢进去训练,然后返回两个值,R我不知道,A如果是错误率的话,我不知道它定义的什么损失,如果不用api,可以写个十来行,RFECV是对的,sklearn里有。
>>> from sklearn.datasets import make_friedman1
>>> from sklearn.feature_selection import RFECV
>>> from sklearn.svm import SVR
>>> X, y = make_friedman1(n_samples=50, n_features=10, random_state=0)
>>> estimator = SVR(kernel="linear")
>>> selector = RFECV(estimator, step=1, cv=5)
>>> selector = selector.fit(X, y)
>>> selector.support_
array([ True, True, True, True, True, False, False, False, False,
False])
>>> selector.ranking_
array([1, 1, 1, 1, 1, 6, 4, 3, 2, 5])
sklearn中的cross_val_score()函数参数:
sklearn.model_selection.cross_val_score(estimator, X, y=None,
cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch=‘2*n_jobs’)
参考:https://blog.csdn.net/qq_41937076/article/details/101313985
问题很好