R语言与数据分析练习:K-Means聚类

R语言与数据分析练习:K-Means聚类

k-means实现

k-means算法,是一种最广泛使用的聚类算法。k-means以k作为参数,把数据分为k个组,通过迭代计算过程,将各个分组内的所有数据样本的均值作为该类的中心点,使得组内数据具有较高的相似度,而组间的相似度最低。

k-means工作原理

  1. 初始化数据,选择k个对象作为中心点。
  2. 遍历整个数据集,计算每个点与每个中心点的距离,将它分配给距离中心最近的组。
  3. 重新计算每个组的平均值,作为新的聚类中心。
  4. 上面2-3步,过程不断重复,直到函数收敛,不再新的分组情况出现。

k-means聚类,适用于连续型数据集。在计算数据样本之间的距离时,通常使用欧式距离作为相似性度量。k-means支持多种距离计算,还包括maximum, manhattan, pearson, correlation, spearman, kendall等。各种的距离算法的介绍,请参考文章R语言实现46种距离算法

kmeans()函数实现

在R语言中,我们可以直接调用系统中自带的kmeans()函数,就可以实现k-means的聚类。同时,有很多第三方算法包也提供了k-means的计算函数。当我们需要使用kmeans算法,可以使用第三方扩展的包,比如flexclust, amap等包。

题目:

在篮球运动中,一般情况下,控球后卫与得分后卫的助攻数较多,小前锋的得分数较多,而大前锋与中锋的助攻数与得分数较少。下表为21名篮球运动员每分钟助攻数和每分钟得分数的数据集,请运用K-Means聚类算法将这21名篮球运动员划分为5类,并通过画图判断他们分别属于什么位置。

数据如下:

assists_per_minute为每分钟助攻次数
points_per_minute为每分钟得分数

R语言与数据分析练习:K-Means聚类

; 实现代码:


setwd('D:/bigdata/R语言与数据分析/data03')
basketballdata  read.csv("data.csv",stringsAsFactors = F)

outfile  matrix(data=NA,
                  nrow = nrow(basketballdata),
                  ncol = 2,
                  byrow = TRUE,
                  dimnames = list(c(1:nrow(basketballdata)),
                                  c("assists_per_minute","points_per_minute")))

outfile[,1]  basketballdata[,2]
outfile[,2]  basketballdata[,3]

summary(outfile)
write.csv(outfile,'datachange.csv',row.names = FALSE)

basketballdata  read.csv('datachange.csv', header = TRUE)
zscoredfile  scale(basketballdata)

write.csv(zscoredfile, 'standardizeddata.csv',row.names = FALSE)
inputfile  read.csv('standardizeddata.csv', header = TRUE)

result  kmeans(inputfile, 5)

type  result$cluster
type
table(type)

centervec  result$center
centervec

max  apply(centervec,2,max)
max

min  apply(centervec,2,min)
min

df = data.frame(rbind(max,min,centervec))
df

library(cluster)
library(factoextra)
fviz_cluster(result,basketballdata)

运行结果:

R语言与数据分析练习:K-Means聚类

; 对球员的数据进行分析:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:6dcbfcb1-49dc-4b9a-b359-beda74893c2b

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bb2f5b7a-9449-497f-a51c-ac26768daaa3

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3e942cd7-8987-41c2-9481-66b432d0b887

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:e94d6743-6974-4d2b-98c2-68316e1115d5

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:50da0cef-843d-492e-93d8-de2759004627

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:d1b4c7cf-d977-4ab9-8653-5af825d8ddf7

推测出每个球员的位置:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ddff36bf-b481-41f4-9f5a-642b18a38883

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bb624009-2462-4a45-bdd0-5bc08b22b54d

(第一组:9,11,15,21)
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f6af1cc7-735c-49de-869e-ae0a22558512

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1ae83224-a9d9-41a3-8fb8-e09f9a443b0e

(第二组:6,7,18)
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:61ab354a-e84b-4ecd-8242-84582d6f26c1

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:534cd65e-055e-446e-adc8-cba97e9a81ad

(第三组:2)
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:91c362af-6114-471e-ab3a-d0149dcc900b

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:d27e1044-04ae-41f2-af11-8510470204cd

(第四组:10,12,14,17,20)
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0a0068e7-144d-4b20-b05b-90ef7bc32618

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f56bab90-6171-4520-a9ba-0a3b57c05ab4

(第五组:1,3,4,5,8,13,16,19)

Original: https://blog.csdn.net/weixin_47580081/article/details/115018232
Author: John Zhuang
Title: R语言与数据分析练习:K-Means聚类

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/560772/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球