R语言如何进行随机森林模型分析
已知有84个样本,64例为A组,14例为B组
有14个基因的表达量数据
目的1:评估这14个基因作为分类依据的准确性
目的2:如何在这14个基因中找到分类依据准确性最高的基因组合?
rf_model<- randomForest(labels ~ ., data=exp.anno.17, importance=TRUE, proximity=TRUE)
print(rf_model)
Call:
randomForest(formula = labels ~ ., data = exp.anno.17, importance = TRUE, proximity = TRUE)
Type of random forest: classification
Number of trees: 500
No. of variables tried at each split: 3
OOB estimate of error rate: 6.41%
Confusion matrix:
NonBA BA class.error
NonBA 9 5 0.3571429
BA 0 64 0.0000000
结果1:OOB estimate of error rate: 6.41%,是不是说明这14个基因作为分类依据的准确性是93.59%
请问如何用R语言完成目的2?
# 导入数据
data <- read.csv("data.csv")
# 将数据分为训练集和测试集
set.seed(123)
train <- sample(1:nrow(data), 60)
train_data <- data[train, ]
test_data <- data[-train, ]
# 训练随机森林模型
library(randomForest)
rf_model <- randomForest(Group ~ ., data = train_data, ntree = 500, importance = TRUE)
# 查看模型准确率
rf_pred <- predict(rf_model, test_data)
table(rf_pred, test_data$Group)
mean(rf_pred == test_data$Group)
# 查看特征重要性
varImpPlot(rf_model)