K- Means聚类算法:
K- Means是迭代动态聚类算法中的一种,其中K表示类别数,Means表示均值。
顾名思义K-Means是一种通过均值对数据点进行聚类的算法。K-Means算法通过预先设定的K值及每个类别的初始质心对相似的数据点进行划分。并通过划分后的均值迭代优化获得最优的聚类结果。
K- Means算法的关键问题
K值的选择
K值是聚类结果中类别的数量。简单的说就是我们希望将数据划分的类别数。K值决定了初始质心的数量。K值为几,就要有几个质心。
选择最优K值没有固定的公式或方法,需要人工来指定,建议根据实际的业务需求,或通过层次聚类(Hierarchical Clustering)的方法获得数据的类别数量作为选择K值的参考。这里需要注意的是选择较大的K值可以降低数据的误差,但会增加过拟合的风险。
算法总结来说就是:
(1)、第一步是为待聚类的点寻找聚类中心;
(2)、第二步是计算每个点到聚类中心的距离,将每个点聚类到离该点最近的聚类中去;
(3)、第三步是计算每个聚类中所有点的坐标平均值,并将这个均值作为新的聚类中心。
反复执行(2)、(3),直到聚类中心不再进行大范围移动或者聚类次数达到要求为止。
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:9f882f27-e01a-4282-b2aa-fbc8c963b3c3
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:d0742340-1f76-466c-bf58-3b4eff636c86
我们的代码里就设置的numClusters = 2;前面的成员变量设置,读文件包括之前已经写过的getRandomIndices(int paraLength)方法和算距离的distance(int paraI, double[] paraArray)方法就不再赘述。
代码的最重点就是在clustering()里:
/**
*******************************
* Clustering.
*******************************
*/
public void clustering() {
int[] tempOldClusterArray = new int[dataset.numInstances()];
tempOldClusterArray[0] = -1;
int[] tempClusterArray = new int[dataset.numInstances()];
Arrays.fill(tempClusterArray, 0);
double[][] tempCenters = new double[numClusters][dataset.numAttributes() - 1];
// Step 1. Initialize centers.
int[] tempRandomOrders = getRandomIndices(dataset.numInstances());
for (int i = 0; i < numClusters; i++) {
for (int j = 0; j < tempCenters[0].length; j++) {
tempCenters[i][j] = dataset.instance(tempRandomOrders[i]).value(j);
} // Of for j
} // Of for i
int[] tempClusterLengths = null;
while (!Arrays.equals(tempOldClusterArray, tempClusterArray)) {
System.out.println("New loop ...");
tempOldClusterArray = tempClusterArray;
tempClusterArray = new int[dataset.numInstances()];
// Step 2.1 Minimization. Assign cluster to each instance.
int tempNearestCenter;
double tempNearestDistance;
double tempDistance;
for (int i = 0; i < dataset.numInstances(); i++) {
tempNearestCenter = -1;
tempNearestDistance = Double.MAX_VALUE;
for (int j = 0; j < numClusters; j++) {
tempDistance = distance(i, tempCenters[j]);
if (tempNearestDistance > tempDistance) {
tempNearestDistance = tempDistance;
tempNearestCenter = j;
} // Of if
} // Of for j
tempClusterArray[i] = tempNearestCenter;
} // Of for i
// Step 2.2 Mean. Find new centers.
tempClusterLengths = new int[numClusters];
Arrays.fill(tempClusterLengths, 0);
double[][] tempNewCenters = new double[numClusters][dataset.numAttributes() - 1];
// Arrays.fill(tempNewCenters, 0);
for (int i = 0; i < dataset.numInstances(); i++) {
for (int j = 0; j < tempNewCenters[0].length; j++) {
tempNewCenters[tempClusterArray[i]][j] += dataset.instance(i).value(j);
} // Of for j
tempClusterLengths[tempClusterArray[i]]++;
} // Of for i
// Step 2.3 Now average
for (int i = 0; i < tempNewCenters.length; i++) {
for (int j = 0; j < tempNewCenters[0].length; j++) {
tempNewCenters[i][j] /= tempClusterLengths[i];
} // Of for j
} // Of for i
System.out.println("Now the new centers are: " + Arrays.deepToString(tempNewCenters));
tempCenters = tempNewCenters;
} // Of while
// Step 3. Form clusters.
clusters = new int[numClusters][];
int[] tempCounters = new int[numClusters];
for (int i = 0; i < numClusters; i++) {
clusters[i] = new int[tempClusterLengths[i]];
} // Of for i
for (int i = 0; i < tempClusterArray.length; i++) {
clusters[tempClusterArray[i]][tempCounters[tempClusterArray[i]]] = i;
tempCounters[tempClusterArray[i]]++;
} // Of for i
System.out.println("The clusters are: " + Arrays.deepToString(clusters));
}// Of clustering
分部解析:Step 1. Initialize centers.
利用getRandomIndices()将tempCenters[][]这个二维数组打乱。接下来就是一个while循环
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:8498db6a-281e-421a-b749-dd505d391b0f
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:ff704473-16fa-4687-83cd-c4e4fc4e314f
Step 2.1 Minimization. Assign cluster to each instance.
这一步就相当于让每一个instance去找老大,找自己所在的簇。把找的簇的编号赋给tempNearestCenter,再tempClusterArray[i] = tempNearestCenter,使得tempClusterArray[]里每个instance所在簇依次排列好。
Step 2.2 Mean. Find new centers
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:907973bf-ec3b-4c57-a6ac-4e15f325d06b
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:f36cf1f0-52aa-47c8-8043-d426310cd7ac
其实呢核心就是这一句,就是一个二维数组,将每一个instance需要按照他所在的簇存在不同行,不同的value 值存在不同列。之后tempClusterLengths[tempClusterArray[i]]++则是相当于分子加了,分母也要相应的加。最后算平均值得出的就是算出的新的中心。
Step 3. Form clusters.
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:7ff66e22-954d-433e-b4cf-4d7f7f584d0b
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:be86b9f5-3667-42e2-a4d9-7e3c77611c96
最后贴上运行结果:
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:e63d4174-d934-4b33-b77b-2e5916205360
[En]
[TencentCloudSDKException] code:FailedOperation.ServiceIsolate message:service is stopped due to arrears, please recharge your account in Tencent Cloud requestId:c2981469-3f22-400f-8642-0a0841566fe4
Original: https://blog.csdn.net/qq_44647576/article/details/124583287
Author: 贾思乐
Title: 日撸 Java 三百行学习笔记day5657
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/560926/
转载文章受原作者版权保护。转载请注明原作者出处!