用r需要求数字和NA的平均值。我怎么求出来是2啊,顺便找一位有偿指导的朋友,空闲时间回答我的问题就好!
👀👀
如果可行的话,建议反复进行交叉验证,以减少决策的不确定性。这个过程和上面一样。我们不是在每个折中得到五个性能值,而是得到五倍的重复次数(这里是三个)。
# We start by making repeated, stratified cross-validation folds
folds <- create_folds(train$Sepal.Length, k = 5, m_rep = 3)
length(folds)
#> [1] 15
for (i in seq_along(valid_mtry)) {
cv_mtry <- numeric()
for (fold in folds) {
fit <- ranger(Sepal.Length ~ ., data = train[fold, ], mtry = i)
cv_mtry <- c(cv_mtry,
rmse(train[-fold, "Sepal.Length"], predict(fit, train[-fold, ])$predictions))
}
valid_mtry[i] <- mean(cv_mtry)
}
# Result of cross-validation
valid_mtry
#> [1] 0.3934294 0.3544207 0.3422013 0.3393454
(best_mtry <- which.min(valid_mtry))
#> [1] 4
# Use optimal mtry to make model
final_fit <- ranger(Sepal.Length ~ ., data = train, mtry = best_mtry)
rmse(test$Sepal.Length, predict(final_fit, test)$predictions)
#> [1] 0.2937055