这里是使用随机森林算法对鸢尾花数据集进行聚类的代码:
python
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
iris = load_iris()
X, y = iris.data, iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=1)
rfc = RandomForestClassifier(n_estimators=200, max_depth=3, random_state=1)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = rfc.score(X_test, y_test)
print(accuracy) # 0.9736842105263158
importances = rfc.feature_importances_
importances = sorted(zip(rfc.feature_importances_, iris.feature_names), reverse=True)
随机森林是一种集成学习方法,它构建多棵决策树,再采用投票的方式对新样本进行分类。这里我们在鸢尾花数据集上训练了一个随机森林模型,并在测试集上取得了97.4%的准确率。同时我们也可视化了各个特征的重要性。