K-means原理及在检测中的应用

【机器学习】K-means(非常详细) – 知乎

【目标检测】K-means计算anchors_机器不学习我学习的博客-CSDN博客_目标检测k-means

1.K-means算法

(1)定义:

K-means 是我们最常用的基于欧式距离的聚类算法,其认为两个目标的距离越近,相似度越大。

(2)步骤:

  1. 随机选择初始化的 k 个样本作为初始聚类中心K-means原理及在检测中的应用
  2. 针对数据集中每个样本K-means原理及在检测中的应用计算它到 k 个聚类中心的距离并将其分到距离最小的聚类中心所对应的类中;
  3. 针对每个类别K-means原理及在检测中的应用 ,重新计算它的聚类中心 (即属于该类的所有样本的质心)K-means原理及在检测中的应用
  4. 重复上面 2 3 两步操作,直到达到某个中止条件(迭代次数、最小误差变化等)。

(3)优点:

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:b90b6770-0018-4db3-9cb3-40c16d1f9925

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2293cebd-ec39-49e5-a78e-c1ac746747ce

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:48f6c7b9-9294-46c6-b83e-8da3603db122

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2b992160-eb4b-4faa-898b-a529e38e3e1b

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f89c6adb-1e82-44f1-9fa8-4199e9c73e95

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bdd1ae95-40f9-4528-82ca-5ac314b3e7eb

* 算法复杂度低。

(4)缺点:

  • K 值需要人为设定,不同 K 值得到的结果不一样;
    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:311478e1-4373-4990-81d5-24d6bde3184a
    [En]

    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:fadb5e35-818b-4a46-ac93-cb206f878ba1

  • 对异常值敏感;
    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:a26551d8-7ef1-4c3f-86df-8452ba15d2b1
    [En]

    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:952307fe-0533-4b84-8b59-7e6f9c6dea67

    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:d75a258e-c9cb-4e2b-8ed4-42ba4f803095

    [En]

    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:34091630-ee7c-414b-bb2f-fa74a17ab9af

(5)优化:

针对 K-means 算法的缺点,有很多调优方式:数据预处理、合理选择 K 值、采用核函数、
K-means++、ISODATA。

K-means原理及在检测中的应用 数据预处理

常见的数据预处理方式有:数据归一化,数据标准化。K-means 的本质是基于欧式距离的数据划分算法,均值和方差大的维度将对数据的聚类产生决定性影响。

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:442cf2e1-4cd1-4539-b12c-c2a22891fa4f

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ef277e84-5f56-4de1-9664-654fea5e3117

K-means原理及在检测中的应用 合理选择 K 值

K 值的选取对 K-means 影响很大,这也是 K-means 最大的缺点,常见的选取 K 值的方法有:手肘法、Gap statistic 方法。

K-means原理及在检测中的应用 采用核函数-核 K-means 算法

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:1380c1b2-ff89-430e-9122-f0b087e9ef2e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:6fe15913-6e66-4e3f-9bb8-42d6f1364520

K-means原理及在检测中的应用 K-means++

初始值的选取对结果的影响很大,对初始值选择的改进是很重要的一部分。K-Means++与 K-Means算法不同仅在于初始质心的选择方式不同,表现上,往往优于 K-Means 算法。

优化K个初始化聚类中心的方式:

2.K-means聚类yolo的anchor

在检测任务中,聚类anchor,处理的数据是训练集中所有bbox的宽和高。距离计算方式不是欧氏距离,是1-IOU。计算bbox和聚类中心表示的bbox时,两个bbox的中心是重合的。

anchor聚类前准备
将训练集中所有需要检测类别的bounding box坐标提取出来。训练数据一般是bbox的4个坐标,我们用的数据是bbox的宽和高,需要将坐标数据转换为框的宽高大小,计算方法很简单:宽 = 右下角横坐标 – 左上角横坐标、高 = 右下角纵坐标-左上角纵坐标。

聚类方式:

(1)初始化K个anchor box:从已经转换成宽和高表示的所有的bbox中随机算则K个bbox作为初始化anchor box;

(2)计算除被选为anchor的每个bounding box与每个anchor box的iou值和距离:两个bbox的中心重叠,如下图所示:

K-means原理及在检测中的应用

代码:

min_w_matrix = np.minimum(cluster_w_matrix, box_w_matrix)      #cluster_w_matrix, box_w_matrix分别代表anchor box和bounding box宽大小
min_h_matrix = np.minimum(cluster_h_matrix, box_h_matrix)      #cluster_h_matrix, box_h_matrix分别代表anchor box和bounding box高大小
inter_area = np.multiply(min_w_matrix, min_h_matrix)               #inter_area表示重叠面积
IOU = inter_area / (box_area + cluster_area - inter_area)#box_area表示bounding box面积 ;cluster_area表示anchor box面积

通过上面的方式得到两个bbox的Iou,再通过1-Iou得到两个bbox的距离

(3)根据bbox的距离划分每个bbox归属于哪一个聚类中心anchor:bbox与哪一个聚类中心anchor的距离最小,属于哪一个聚类中心anchor;

(4)anchor box更新:所有的bbox划分完以后,这些bbox被分成了K份,要更新聚类中心anchor-求每份bbox的宽高中值大小或均值,将其作为新的聚类中心anchor;

(5)重复操作(2)(3)(4),直至anchor尺寸不再更新;

(6)计算anchor boxes精确度。

Original: https://blog.csdn.net/baidu_38262850/article/details/125705122
Author: 沙小菜
Title: K-means原理及在检测中的应用

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/561124/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球