鸢尾花的随机森林聚类

凑字zxcvbnmasdfghjklqwertyyuiopp

这里是使用随机森林算法对鸢尾花数据集进行聚类的代码:
python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

加载鸢尾花数据集

iris = load_iris()
X, y = iris.data, iris.target

训练测试集分割

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)

构建随机森林模型

rfc = RandomForestClassifier(n_estimators=200, max_depth=3, random_state=1)
rfc.fit(X_train, y_train)

预测测试集

y_pred = rfc.predict(X_test)

计算准确率

accuracy = rfc.score(X_test, y_test)
print(accuracy) # 0.9736842105263158

通过feature_importances_属性可视化特征重要性

importances = rfc.feature_importances_
importances = sorted(zip(rfc.feature_importances_, iris.feature_names), reverse=True)
随机森林是一种集成学习方法,它构建多棵决策树,再采用投票的方式对新样本进行分类。这里我们在鸢尾花数据集上训练了一个随机森林模型,并在测试集上取得了97.4%的准确率。同时我们也可视化了各个特征的重要性。