[笔记]机器学习之机器学习理论及案例分析《二》 聚类

#21天学习挑战赛—机器学习#

活动地址:CSDN21天学习挑战赛

文章目录

前言

聚类

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8b3f1a5c-1e08-4ee5-8ca0-de56b16a8b30

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bc958e71-5b69-4ce1-b36a-64540e00a041

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:4aab65fd-eda1-4adb-92a7-5c018c48a918

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:cc531d91-c0bd-4ecb-9e6e-31538879f9ed

聚类主要应用在:

  • 发现数据的潜在结构
  • 对数据进行自然分组
  • 对数据进行压缩

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3a9f17ea-beb7-4b93-abd5-14df1419b0c3

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7cc10a3b-1458-4660-aba1-ca3474f3b16e

聚类定义

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2e0958b0-1105-457c-9dad-191de2283b2e

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:3c6319ff-6eb6-459a-93ca-178f1443ed82

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ecd28219-d2a3-42bd-add4-59b4c3de1b9d

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:bbcff937-048b-4a10-8c2b-0faa893de887

聚类是把一个数据对象的集合划分成簇(子集),使簇内对象彼此相似,簇间对象不相似的过程。

什么是簇

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:770238dd-a74d-4b70-8f74-e96abf970c49

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c5d877eb-1d42-40db-9384-c15668230efb

给定一个数据集X,距离函数d,考虑一个聚类函数F,Kleinbreg描述了以下3个属性:

  • 尺度不变性:对于任意距离函数d和任意常数a>0,有F(d)=F(αd)
  • 划分丰富性:聚类函数F输出的数据簇划分集合包含数据所有可能的簇划分结果(标记一下)
  • 距离一致性:令d和d是两个距离函数,如果d在的基础上缩小同一簇中数据之间的距离,扩大不同簇中数据之间的距离,则F(d)= F(d`)

简单地说就是:
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:44f97784-086b-4ccb-8eba-5df47edc758c

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:2d678587-8b0c-4c75-bd70-9e9dbc97c203

[笔记]机器学习之机器学习理论及案例分析《二》 聚类

; 聚类分类

聚类方法大体可以分为3个阶段:

  • 经典算法:它是2000年以前,而向早期的数据库及相关应用开发的算法。比如基于模型的算法,基于划分的算法,基于密度的算法,基于网格的算法,层次聚类算法,
  • 高级算法:它是2000年以来,在经典算法的基础上,针对更为复杂的数据和任务开发的算法。比如谱聚类,高维数据聚类,基于非负数矩阵分解的聚类,不确定数据聚类:
    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:71d94c12-e9f2-4203-81ad-cd135ecb3094
    [En]

    [TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:4f615438-705f-4ff8-90b3-fada6ff6c524

[笔记]机器学习之机器学习理论及案例分析《二》 聚类

离群点

简单理解为噪声点 或者 离各个簇都很远,如下绿色圈中的

[笔记]机器学习之机器学习理论及案例分析《二》 聚类

; 聚类算法实例

这里,我们以 经典算法——基于划分的聚类算法——k-均值算法为例,对聚类进行初步探索。

备注:基于划分的方法是一种被广泛研究和应用的数据聚类方法,这类方法的大部分算法都有着简洁易懂,易于实现等优点,在许多领域都发挥巨大作用,基于划分的方法通过一个最优化的目标函数发掘数据中包含的类别信息结构,通常以迭代的方式逐步提高聚类的效果。

这里首先需要引入一个 原型点的概念, 即可以体现出某一类别特性的代表的点,因此,基于划分的算法通常需要设定某种确定的参数来选取可以代表对应簇的原型点.

原型点:可以体现出某一类别特性的代表的点

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:05a8fa01-11ff-467a-b304-ae1334b00605

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:194db826-8c0a-428e-a49f-520060c4cacc

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:0d263c85-baa8-4bcd-b39b-5f98569c4ff6

[En]

[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:901a3379-ec88-4434-b81e-a897c2a6459a

[笔记]机器学习之机器学习理论及案例分析《二》 聚类

K-Means算法(k-均值算法)

k-均值算法在许多学科科领域内得到了大量的研究和应用,具体如:

  • 数据压缩、
  • 数据分类、
  • 密度估计等方面,

因为其算法思想较为 简洁易懂,且 花费较小的计算代价可以获得不错的聚类结果等原因,k-均值算法成为各种聚类算法中较为常用的算法之一。

算法实现k-means是最大分离和最大内聚的最简单实现.

基本步骤是:

  • 寻找质心最佳位置

寻找质心最佳位置

[笔记]机器学习之机器学习理论及案例分析《二》 聚类
[笔记]机器学习之机器学习理论及案例分析《二》 聚类
[笔记]机器学习之机器学习理论及案例分析《二》 聚类
[笔记]机器学习之机器学习理论及案例分析《二》 聚类
[笔记]机器学习之机器学习理论及案例分析《二》 聚类
如果SSE(t+1)

Original: https://blog.csdn.net/qq1113673178/article/details/126168724
Author: 二次元怪兽
Title: [笔记]机器学习之机器学习理论及案例分析《二》 聚类

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/563282/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球