【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】

这是一个目录

k-means聚类函数

先放例程

官方例程点这☜ Train a k-Means Clustering Algorithm.


load fisheriris
X = meas(:,3:4);

figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

rng(1); % For reproducibility
[idx,C] = kmeans(X,3);

    % Assigns each node in the grid to the closest centroid
x1 = min(X(:,1)):0.01:max(X(:,1));
x2 = min(X(:,2)):0.01:max(X(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)]; % Defines a fine grid on the plot

idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);
    % Assigns each node in the grid to the closest centroid

figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...

    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
legend('Region 1','Region 2','Region 3','Data','Location','SouthEast');
hold off;

分段解析

  • 注:”%”段为注释,MATLAB代码好像没办法在markdown编辑器中高亮(小声,也可能是我自己没研究明白QuQ)。

PART1——载入数据集

load fisheriris
X = meas(:,3:4);

加载样本数据并取数据数组的第3,4列存到变量 _X_中。
这一步的目的主要是为了获取样本数据集,还有很多其他类型的数据集可以使用,可以参考这篇博客《一些用于聚类和分类问题的数据集》
fisheriris——鸢尾花数据集(意为fisher算法的iris数据集),是一类多重变量分析的数据集,样本数量150个,每类50个。通过花萼长度,花萼宽度,花瓣长度,花瓣宽度4个属性预测鸢尾花卉属于(Setosa(山鸢尾),Versicolour(杂色鸢尾),Virginica(维吉尼亚鸢尾))三个种类中的哪一类。
鸢尾花完整数据集放在文末。

PART2——画出数据集分布图

figure;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';

figure——创建图窗窗口
X(:,1) 表示 X 数组的第一列, X(:,2) 同理。
运行 plot 函数产生下图,其中 X 的第一列为横坐标取值,第二列为纵坐标取值。

【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】

PART3——kmeans对数据集聚类

rng(1); % For reproducibility
[idx,C] = kmeans(X,3);

rng——控制随机数生成器
常用语法有:

rng(seed)
rng(seed,generator)
s = rng

指定 MATLAB® 随机数生成器的种子。例如, rng(1) 使用种子 1 初始化梅森旋转生成器。相关例程请参考《rng控制随机数生成器》
kmeans——k均值聚类
常用语法有:

idx = kmeans(X,k)
idx = kmeans(X,k,Name,Value)
[idx,C] = kmeans()
[idx,C,sumd] = kmeans(
)
[idx,C,sumd,D] = kmeans(___)

此处idx是长度为N×1的标号数组,C是聚类中心坐标值组成的数组,因此处聚类组数k=3,所处空间为二维平面,因此C的大小为3×2

PART4——确定坐标栅格

    % Assigns each node in the grid to the closest centroid
x1 = min(X(:,1)):0.01:max(X(:,1));
x2 = min(X(:,2)):0.01:max(X(:,2));
[x1G,x2G] = meshgrid(x1,x2);
XGrid = [x1G(:),x2G(:)];  % Defines a fine grid on the plot

x1,x2用于决定坐标范围,取0.01为最小间距。

【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
x ∈ [ 1 , 6.9 ] y ∈ [ 0.1 , 2.5 ] x \in [1,6.9] \ y \in [0.1, 2.5]x ∈[1 ,6 .9 ]y ∈[0 .1 ,2 .5 ]
以0.01为间距划分,可以计算出x1和x2的长度:
x 1 = ( 6.9 − 1 ) / 0.01 + 1 = 591 x 2 = ( 2.5 − 0.1 ) / 0.01 + 1 = 241 x1=(6.9-1)/0.01+1=591\x2=(2.5-0.1)/0.01+1=241 x 1 =(6 .9 −1 )/0 .0 1 +1 =5 9 1 x 2 =(2 .5 −0 .1 )/0 .0 1 +1 =2 4 1
在工作区中可以看到变量长度与计算结果相同:
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
mshgrid建立二维网格

PART5——kmeans对网格点聚类

idx2Region = kmeans(XGrid,3,'MaxIter',1,'Start',C);
    % Assigns each node in the grid to the closest centroid

MaxIter是指kmeans算法最大迭代次数,此处最大迭代次数为1。

PART6——作图

figure;
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...

    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');
hold on;
plot(X(:,1),X(:,2),'k*','MarkerSize',5);
title 'Fisher''s Iris Data';
xlabel 'Petal Lengths (cm)';
ylabel 'Petal Widths (cm)';
legend('Region 1','Region 2','Region 3','Data','Location','SouthEast');
hold off;

gscatter——散点图绘制工具
常用语法有:

gscatter(x,y,g)
gscatter(x,y,g,clr,sym,siz)
gscatter(x,y,g,clr,sym,siz,doleg)
gscatter(x,y,g,clr,sym,siz,doleg,xnam,ynam)
gscatter(ax, )
h = gscatter(
)

值得注意的是:咋此处gscatter并不是用来绘制”散点图”,而是利用带有颜色的密集散点形成带颜色的区域。因为region取值足够大,所以散点足够密集(就成一片了0m0)。

【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】

[0,0.75,0.75;0.75,0,0.75;0.75,0.75,0]是三个RGB颜色取值,对应颜色分别为:

【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】
gscatter(XGrid(:,1),XGrid(:,2),idx2Region,...

    [0,0.75,0.75;0.75,0,0.75;0.75,0.75,0],'..');

这个’…’指的(应该)是散点的形状,替换成’.’输出的图片一样。(我怀疑这里是写例程的老师手抖了,多大了一个.【狗头保命】)

legend——在坐标区上添加图例
常用语法有:

legend
legend(label1,…,labelN)
legend(labels)
legend(subset, )
legend(target,
)

【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】

鸢尾花数据集

萼片长(sepal length)萼片宽(sepal width)花瓣长(petal length)花瓣宽(petal width)5.103.501.400.204.903.001.400.204.703.201.300.204.603.101.500.205.003.601.400.205.403.901.700.404.603.401.400.305.003.401.500.204.402.901.400.204.903.101.500.105.403.701.500.204.803.401.600.204.803.001.400.104.303.001.100.105.804.001.200.205.704.401.500.405.403.901.300.405.103.501.400.305.703.801.700.305.103.801.500.305.403.401.700.205.103.701.500.404.603.601.000.205.103.301.700.504.803.401.900.205.003.001.600.205.003.401.600.405.203.501.500.205.203.401.400.204.703.201.600.204.803.101.600.205.403.401.500.405.204.101.500.105.504.201.400.204.903.101.500.205.003.201.200.205.503.501.300.204.903.601.400.104.403.001.300.205.103.401.500.205.003.501.300.304.502.301.300.304.403.201.300.205.003.501.600.605.103.801.900.404.803.001.400.305.103.801.600.204.603.201.400.205.303.701.500.205.003.301.400.207.003.204.701.406.403.204.501.506.903.104.901.505.502.304.001.306.502.804.601.505.702.804.501.306.303.304.701.604.902.403.301.006.602.904.601.305.202.703.901.405.002.003.501.005.903.004.201.506.002.204.001.006.102.904.701.405.602.903.601.306.703.104.401.405.603.004.501.505.802.704.101.006.202.204.501.505.602.503.901.105.903.204.801.806.102.80 4.001.306.302.504.901.506.102.804.701.206.402.904.301.306.603.004.401.406.802.804.801.406.703.005.001.706.002.904.501.505.702.603.501.005.502.403.801.105.502.403.701.005.802.703.901.206.002.705.101.605.403.004.501.506.003.404.501.606.703.104.701.506.302.304.401.305.603.004.101.305.502.504.001.305.502.604.401.206.103.004.601.405.802.604.001.205.002.303.301.005.602.704.201.305.703.004.201.205.702.904.201.306.202.904.301.305.102.503.001.105.702.804.101.306.303.306.002.505.802.705.101.907.103.005.902.106.302.905.601.806.503.005.802.207.603.006.602.104.902.504.501.707.302.906.301.806.702.505.801.807.203.606.102.506.503.205.102.006.402.705.301.906.803.005.502.105.702.505.002.005.802.805.102.406.403.205.302.306.503.005.501.807.703.806.702.207.702.606.902.306.002.205.001.506.903.205.702.305.602.804.902.007.702.806.702.006.302.704.901.806.703.305.702.107.203.206.001.806.202.804.801.806.103.004.901.806.402.805.602.107.203.005.801.607.402.806.101.907.903.806.402.006.402.805.602.206.302.805.101.506.102.605.601.407.703.006.102.306.303.405.602.406.403.105.501.806.003.004.801.806.903.105.402.106.703.105.602.406.903.105.102.305.802.705.101.906.803.205.902.306.703.305.702.506.703.005.202.306.302.505.001.906.503.005.202.006.203.405.402.305.903.005.101.80

Original: https://blog.csdn.net/weixin_45074807/article/details/123399672
Author: 一只菟葵
Title: 【入门向】k-means聚类函数详解(基于鸢尾花数据集)【MATLAB】

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/549692/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球