运行KNN，预测准确率一直为1，为什么(标签-R语言)

使用R语言中蘑菇数据运行KNN时，最后预测准确率一直为1，能帮我看看是为什么吗，非常感谢。下面是我的代码


library(cba)
data(Mushroom)
Mushroom
str(Mushroom)#查看基本信息
dim(Mushroom)
Mushroom$`veil-type`<- NULL
Mushroom<- na.omit(Mushroom)
sum(is.na(Mushroom))
dim(Mushroom)
#定义因子
Mushroom$class <- as.factor(Mushroom$class)
for (i in 2:ncol(Mushroom)) {
  Mushroom[, i] <- as.numeric(factor(Mushroom[, i]))
}#转换为数值型
Mushroom
str(Mushroom)
###标准化
Mushroom[,-1] <- scale(Mushroom[,-1])
Mushroom
# 划分训练集和测试集
set.seed(123)
train_index <- sample(1:nrow(Mushroom),size=nrow(Mushroom)*0.8,replace=F)
train<- Mushroom[train_index, ]
test<- Mushroom[-train_index, ]
dim(train)
dim(test)
train
# 运行KNN算法进行分类
library(class)
knn_pred <- knn(train = train[, -1], test = test[, -1], cl = train$class, k = 5)
# 计算预测准确率
sum(knn_pred == test[,1]) /dim(test)[1]
#交叉表展示
library(gmodels)
CrossTable(x=test[,1],y=knn_pred,prop.chisq = F)
#结果
> sum(knn_pred == test[,1]) /dim(test)[1]
[1] 1
##交叉表
             | knn_pred 
   test[, 1] |    edible | poisonous | Row Total | 
-------------|-----------|-----------|-----------|
      edible |       669 |         0 |       669 | 
             |     1.000 |     0.000 |     0.593 | 
             |     1.000 |     0.000 |           | 
             |     0.593 |     0.000 |           | 
-------------|-----------|-----------|-----------|
   poisonous |         0 |       460 |       460 | 
             |     0.000 |     1.000 |     0.407 | 
             |     0.000 |     1.000 |           | 
             |     0.000 |     0.407 |           | 
-------------|-----------|-----------|-----------|
Column Total |       669 |       460 |      1129 | 
             |     0.593 |     0.407 |           | 
-------------|-----------|-----------|-----------|

根据您提供的代码，我发现您的KNN算法的预测准确率一直为1，可能是由于您的测试集和训练集之间存在重复数据导致的。在划分训练集和测试集时，您使用了以下代码：

train_index <- sample(1:nrow(Mushroom),size=nrow(Mushroom)*0.8,replace=F)
train<- Mushroom[train_index, ]
test<- Mushroom[-train_index, ]

其中，sample()函数用于随机抽取80%的数据作为训练集，剩余的20%作为测试集。但是，由于您没有设置replace参数为FALSE，因此在抽取数据时可能会出现重复数据。这会导致测试集中的某些数据在训练集中也存在，从而导致预测准确率为1。

为了解决这个问题，您可以将replace参数设置为FALSE，即：

train_index <- sample(1:nrow(Mushroom),size=nrow(Mushroom)*0.8,replace=FALSE)
train<- Mushroom[train_index, ]
test<- Mushroom[-train_index, ]

这样可以确保训练集和测试集之间没有重复数据，从而得到更准确的预测准确率。

另外，您还可以尝试调整K值，以获得更好的预测效果。

这个问题的回答你可以参考下: https://ask.csdn.net/questions/7523087
我还给你找了一篇非常好的博客，你可以看看是否有帮助，链接：在R语言中怎样按照某一列分组求均值
除此之外, 这篇博客: 多元相关与回归分析及R使用中的 逐步回归分析 部分也许能够解决你的问题, 你可以仔细阅读以下内容或跳转源博客中阅读:

①向前引入法

> fm=lm(y~x1+x2+x3+x4,data=yx) #多元数据线性回归模型
> fm.step=step(fm,direction = "forward") #向前引入法变量选择结果
Start:  AIC=68.15
y ~ x1 + x2 + x3 + x4

②向后剔除法

> fm.step=step(fm,direction = "backward") #向后剔除法变量选择结果
Start:  AIC=68.15
y ~ x1 + x2 + x3 + x4

       Df Sum of Sq  RSS   AIC
- x3    1         0  202  66.2
- x1    1         1  204  66.4
<none>               202  68.2
- x4    1       174  376  85.4
- x2    1      6433 6635 174.4

Step:  AIC=66.16
y ~ x1 + x2 + x4

       Df Sum of Sq  RSS   AIC
- x1    1         2  204  64.4
<none>               202  66.2
- x4    1       197  400  85.3
- x2    1      7382 7585 176.5

Step:  AIC=64.39
y ~ x2 + x4

       Df Sum of Sq    RSS   AIC
<none>                 204  64.4
- x4    1       549    753 102.9
- x2    1    367655 367859 294.8

③逐步筛选法

> fm.step=step(fm,direction = "both") #逐步筛选法变量选择结果
Start:  AIC=68.15
y ~ x1 + x2 + x3 + x4

       Df Sum of Sq  RSS   AIC
- x3    1         0  202  66.2
- x1    1         1  204  66.4
<none>               202  68.2
- x4    1       174  376  85.4
- x2    1      6433 6635 174.4

Step:  AIC=66.16
y ~ x1 + x2 + x4

       Df Sum of Sq  RSS   AIC
- x1    1         2  204  64.4
<none>               202  66.2
+ x3    1         0  202  68.2
- x4    1       197  400  85.3
- x2    1      7382 7585 176.5

Step:  AIC=64.39
y ~ x2 + x4

       Df Sum of Sq    RSS   AIC
<none>                 204  64.4
+ x1    1         2    202  66.2
+ x3    1         0    204  66.4
- x4    1       549    753 102.9
- x2    1    367655 367859 294.8

参考致谢：
王斌会.多元统计分析及R语言建模（第四版）

如有侵权，请联系侵删
需要本实验源数据及代码的小伙伴请联系QQ:2225872659

您还可以看一下陈堰平老师的R语言数据分析入门课程中的数据整理入门小节, 巩固相关知识点