线性判别分析:
library(rrcov)
library(readxl)
LM <- read_excel("D:/LM.xlsx")
LM
data(LM)
LM[c(2:12,72:82,142:152),]
attach(LM);
library(MASS)
set.seed(210)
train0= sample(1:210,160);
train<-LM[train0,]
table(LM$ID[train0]);
ld.LM =lda(ID~.,data=train);
ld.LM;
plot(ld.LM)
• 函数lda( )输出的结果包括每个类别的先验概率、每个类别数据的组平均值、第一和第二线性判别函数的系数,以及第一和第二线性判别函数解释方程的比例。
• 从分析结果可以看出两个线性判别函数对总体数据的方差的解释比例分别为65.45%,34.55% .
• 两个线性判别函数为:
ID1= 0.216211941×area-4.176715154×perimeter-12.835380235×compactness+6.304811697×length of kernel+2.033838305×width of kernel+0.004471994×asymmetry coefficient-2.514167008×length of kernel groove
ID2= -4.6259825×area+8.9237180×perimeter+108.1138181×compactness+ 9.7039310×length of kernel -1.4599728×width of kernel -0.2646445×asymmetry coefficient-7.3096302×length of kernel groove
预测并提供精度
piris.lda = predict(ld.LM, LM[-train0, ])$class;
cl.test = LM$ID[-train0];
tab = table(cl.test, piris.lda)
tab;
library(mclust)
classError(piris.lda,cl.test)
对比:全部数据建模并分类
ld.cv = lda(ID~.,data=LM, prior = c(54,56,50)/160, CV = TRUE); table(LM$ID, ld.cv$class)
mean(LM$ID!=ld.cv$class)
二阶判别法
再次,利用函数 qda( )对数据进行判别分析.
head(LM)
ld.cv2<-qda(ID~.,data=LM, prior = c(54,56,50)/160, CV = TRUE)
table(LM$ID,ld.cv2$class)
mean(LM$ID!=ld.cv$class)
head(LM)
set.seed(210)
train.ID = sample(1:210,105)
train = rbind(LM[train.ID,,1],LM[train.ID,,2],LM[train.ID,,3])
test = rbind(LM[-train.ID,,1],LM[-train.ID,,2],LM[-train.ID,,3])
c1 = factor(c(rep("1",105),rep("2",105),rep("3",105)))
qda.LM= qda(train,c1, method="mle")
piris.qda = predict(qda.LM, test)$class;
classError(piris.qda,c1)
从QDA方法可以看出,以上样本被错判,分类错误率为0.6793651.
判别分析可视化
head(LM)
library(MASS)
library(klaR)
图片:
