拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

最近我们被客户要求撰写关于鸢尾花iris数据集的研究报告,包括一些图形和统计输出。

【视频】KMEANS均值聚类和层次聚类:R语言分析生活幸福质量系数可视化实例

KMEANS均值聚类和层次聚类:R语言分析生活幸福质量系数可视化实例

,时长06:05

问题:使用R中的鸢尾花数据集

(a)部分:k-means聚类
使用k-means聚类法将数据集聚成2组。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b7727c58-1c3c-4b9e-8fdd-56d26ac32985

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:925bd7cc-8728-4101-a2a4-f3b3269e4a28

使用k-means聚类法将数据集聚成3组。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:5fbff942-4624-4c23-adcb-0e37b733881e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:04eedf09-4add-41a7-94ec-0df6a9a86910

(b)部分:层次聚类
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ee012b58-6504-4e2a-aa53-d6948320e81c

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:92bbc5aa-6f10-4dd0-9715-986cdc277c80

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:127cad60-8f46-4de4-84eb-19c7bf1c136a

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ffd91b46-07fd-4373-a513-9ec06d184ff1

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:522719d8-f9da-4e99-9d3f-5efbbc2e45e9

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3f77070f-23a8-4b35-8d38-fac786d8edb7

主成分分析PCA降维方法和R语言分析葡萄酒可视化实例

主成分分析PCA降维方法和R语言分析葡萄酒可视化实例

,时长04:30

问题01:使用R中建立的鸢尾花数据集。

(a):k-means聚类

讨论和/或考虑对数据进行标准化。


data.frame(
  "平均"=apply(iris[,1:4], 2, mean
  "标准差"=apply(iris[,1:4], 2, sd)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1dc937ba-3086-4686-9f03-dcc179b28b9e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0b4f31d6-443f-4c57-a4bb-e7daf3ebb477

使用k-means聚类法将数据集聚成2组

使用足够大的nstart,更容易得到对应最小RSS值的模型。

kmean(iris, nstart = 100)

画一个图来显示聚类的情况


绘制数据
plot(iris, y = Sepal.Length, x = Sepal.Width)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

为了更好地考虑花瓣的长度和宽度,使用PCA首先降低维度会更合适。

 创建模型

PCA.mod<- pca(x="iris)" #把预测的组放在最后 pca$pred <-pred #绘制图表 plot(pc, y="PC1," x="PC2," col="Pred)</code"></->

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

为了更好地解释PCA图,考虑到主成分的方差。

## &#x770B;&#x4E00;&#x4E0B;&#x4E3B;&#x8981;&#x6210;&#x5206;&#x6240;&#x89E3;&#x91CA;&#x7684;&#x65B9;&#x5DEE;

for (i in 1:nrow) {
  pca[["PC"]][i] <- paste("pc", i) } < code></->

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集
plot(data = pca,x = &#x4E3B;&#x6210;&#x5206;, y = &#x65B9;&#x5DEE;&#x6BD4;&#x4F8B;, group = 1)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

数据中80%的方差是由前两个主成分解释的,所以这是一个相当好的数据可视化。

使用k-means聚类法将数据集聚成3组

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2dc951a8-9ce9-49af-b414-feffdfa98024

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b4d0b1eb-5d1f-415a-b5c5-2fc297f32031

kmean(input, centers = 3, nstart = 100)
&#x5236;&#x4F5C;&#x6570;&#x636E;
groupPred %>% print()

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

画一个图来显示聚类的情况


 &#x7ED8;&#x5236;&#x6570;&#x636E;
plot(&#x843C;&#x7247;&#x957F;&#x5EA6;,&#x843C;&#x7247;&#x5BBD;&#x5EA6;, col =pred)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

PCA图

为了更好地考虑花瓣的长度和宽度,使用PCA首先减少维度是比较合适的。

#&#x521B;&#x5EFA;&#x6A21;&#x578B;
prcomp(x = iris)

#&#x628A;&#x9884;&#x6D4B;&#x7684;&#x7EC4;&#x653E;&#x5728;&#x6700;&#x540E;
PCADF$KMeans&#x9884;&#x6D4B;<- pred #绘制图表 plot(pca, y="PC1," x="PC2,col" = "预测\n聚类", caption="&#x9E22;&#x5C3E;&#x82B1;&#x6570;&#x636E;&#x7684;&#x524D;&#x4E24;&#x4E2A;&#x4E3B;&#x6210;&#x5206;&#xFF0C;&#x692D;&#x5706;&#x4EE3;&#x8868;90%&#x7684;&#x6B63;&#x5E38;&#x7F6E;&#x4FE1;&#x5EA6;&#xFF0C;&#x4F7F;&#x7528;K-means&#x7B97;&#x6CD5;&#x5BF9;2&#x4E2A;&#x7C7B;&#x8FDB;&#x884C;&#x9884;&#x6D4B;" ) + < code></->

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

PCA双曲线图

萼片长度~萼片宽度图的分离度很合理,为了选择在X、Y上使用哪些变量,我们可以使用双曲线图。

biplot(PCA)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:641d3e1b-099c-4dd0-8eb9-beb883b11fe4

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:97f3d143-b156-4bea-b154-d80df89d7ae3

plot(iris, col = KM&#x9884;&#x6D4B;)

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

评估所有可能的组合。

iris %>%
  pivot_longer()  %>%
plot(col = KM&#x9884;&#x6D4B;, facet_grid(name ~ ., scales = 'free_y', space = 'free_y', ) +

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

层次聚类

使用全连接法对观测值进行聚类。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:030a0748-17a7-41ad-8b05-4e9e09e5b95d

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:9735085c-9853-4a29-bae9-2d3a00955ddc

hclust(dst, method = 'complete')

使用平均和单连接对观察结果进行聚类。

 hclust(dst, method = 'average')
hclust(dst, method = 'single')

绘制预测图

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7d550d49-891d-478d-ba7c-6390a1dbbd34

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7c92a612-1476-49cc-9343-426af1368559


 &#x6570;&#x636E;
iris$KMeans&#x9884;&#x6D4B;<- grouppred # 绘制数据 plot(iris,col="KMeans&#x9884;&#x6D4B;))</code"></->

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

绘制上述聚类方法的树状图

对树状图着色。

type<- c("平均", "全", "单") for (hc in models) plot(hc, cex="0.3)" < code></->

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

Original: https://blog.csdn.net/qq_19600291/article/details/118111149
Author: 拓端研究室
Title: 拓端tecdat|R语言k-means聚类、层次聚类、主成分(PCA)降维及可视化分析鸢尾花iris数据集

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/560736/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球