python怎么写入聚类标签_如何从k均值聚类算法中生成聚类的输出标签？

2023年6月2日下午2:50 • 人工智能 • 阅读 70

我的输入特性集是csv文件的形式。我用过

http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_silhouette_analysis.html

在我的未标记数据上选择簇数，并且我得到的是cluster size=3的最高分数

因此我的簇=3。有没有一种方法可以为kmeans聚类算法的每个输入行生成输出标签？在

基本上我想打印出kmeans算法的所有输入特性和结果(集群标签)

Ie我想打印出所有输入特性和分配给每行输入csv文件的集群标签。在

python代码import pandas

from pandas import read_csv

names = [“Variable1”, “Variable2”, “Variable3”, “Variable4”, “Variable5”, “Variable6”, “Variable7”, “Variable8”, “Variable9”, “Variable10”, “Variable11”, “Variable12”, “InputClass”]

filename = ‘C:/Users/svx/d_features.csv’

dataframe = read_csv(filename, names=names)

array = dataframe.values

X = array[:,2:11]

range_n_clusters = [2, 3, 4, 5, 6]

for n_clusters in range_n_clusters:

Create a subplot with 1 row and 2 columns

fig, (ax1, ax2) = plt.subplots(1, 2)

fig.set_size_inches(18, 7)

The 1st subplot is the silhouette plot

The silhouette coefficient can range from -1, 1 but in this example all

lie within [-0.1, 1]

ax1.set_xlim([-0.1, 1])

The (n_clusters+1)*10 is for inserting blank space between silhouette

plots of individual clusters, to demarcate them clearly.

ax1.set_ylim([0, len(X) + (n_clusters + 1) * 10])

Initialize the clusterer with n_clusters value and a random generator

seed of 10 for reproducibility.

clusterer = KMeans(n_clusters=n_clusters, random_state=10)

cluster_labels = clusterer.fit_predict(X)

The silhouette_score gives the average value for all the samples.

This gives a perspective into the density and separation of the formed

clusters

silhouette_avg = silhouette_score(X, cluster_labels)

print(“For n_clusters =”, n_clusters,

“The average silhouette_score is :”, silhouette_avg)

Compute the silhouette scores for each sample

sample_silhouette_values = silhouette_samples(X, cluster_labels)

y_lower = 10

for i in range(n_clusters):

Aggregate the silhouette scores for samples belonging to

cluster i, and sort them

ith_cluster_silhouette_values = \

sample_silhouette_values[cluster_labels == i]

ith_cluster_silhouette_values.sort()

size_cluster_i = ith_cluster_silhouette_values.shape[0]

y_upper = y_lower + size_cluster_i

color = cm.spectral(float(i) / n_clusters)

ax1.fill_betweenx(np.arange(y_lower, y_upper),

0, ith_cluster_silhouette_values,

facecolor=color, edgecolor=color, alpha=0.7)

Label the silhouette plots with their cluster numbers at the middle

ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))

Compute the new y_lower for next plot

y_lower = y_upper + 10 # 10 for the 0 samples

ax1.set_title(“The silhouette plot for the various clusters.”)

ax1.set_xlabel(“The silhouette coefficient values”)

ax1.set_ylabel(“Cluster label”)

The vertical line for average silhouette score of all the values

ax1.axvline(x=silhouette_avg, color=”red”, linestyle=”–“)

ax1.set_yticks([]) # Clear the yaxis labels / ticks

ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])

2nd Plot showing the actual clusters formed

colors = cm.spectral(cluster_labels.astype(float) / n_clusters)

ax2.scatter(X[:, 0], X[:, 1], marker=’.’, s=30, lw=0, alpha=0.7,

c=colors)

Labeling the clusters

centers = clusterer.cluster_centers_

Draw white circles at cluster centers

ax2.scatter(centers[:, 0], centers[:, 1],

marker=’o’, c=”white”, alpha=1, s=200)

for i, c in enumerate(centers):

ax2.scatter(c[0], c[1], marker=’$%d$’ % i, alpha=1, s=50)

ax2.set_title(“The visualization of the clustered data.”)

ax2.set_xlabel(“Feature space for the 1st feature”)

ax2.set_ylabel(“Feature space for the 2nd feature”)

plt.suptitle((“Silhouette analysis for KMeans clustering on sample data “

“with n_clusters = %d” % n_clusters),

fontsize=14, fontweight=’bold’)

plt.show()

Original: https://blog.csdn.net/weixin_39618730/article/details/113981471
Author: weixin_39618730
Title: python怎么写入聚类标签_如何从k均值聚类算法中生成聚类的输出标签？

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/561228/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

ICLR‘23论文得分排名! 多篇论文竟同时获1分和10分?

人工智能领域综合顶级会议ICLR2023（ International Conference on Learning Representations）审稿意见已出炉。ICLR的审稿…

人工智能 2023年6月30日
0083
OpenCV（C++版本）基础相关（1）：VS2017与OpenCV4.5.1安装配置教程

文章目录一、VS2017安装 * 1.1 下载 1.2 安装二、OpenCV4.5.1配置 * 2.1 下载 2.2 运行 2.3 环境变量配置三、VS2017种配置Open…

人工智能 2023年6月18日
0095
【OpenCVSharp Mat和byte[]互相转换】

版本：opencvsharp-4.5.3.20211228 我们以三通道图片为例： 1：Mat->byte[] Mat mat = Cv2.ImRead(fullpath);…

人工智能 2023年7月18日
0053
使用 Pandas GUI 进行数据探索

数据预处理是数据科学管道的重要组成部分，需要找出数据中的各种不规则性，操作您的特征等。 Pandas 是我们经常使用的一种工具，用于处理数据，还有 seaborn 和 matplo…

人工智能 2023年7月16日
0047
Pandas—DataFrame函数说明

DataFrame表示的是矩阵的数据表，它包含已排序的列集合，每一列可以是不同的值类型（数值、字符串、布尔值等）。DataFrame既有行索引也有列索引。在DataFrame中，数…

人工智能 2023年7月7日
0086
FB15K-237知识图谱数据集的介绍与分析，Freebase

FB15k-237是知识图谱Freebase的子集，15k表示其中知识库中有15k个主题词，237表示共有237种关系。一、FB15K-237的知识库整个Freebase知识图…

人工智能 2023年6月10日
0069
10.Unity2D 横版简单AI 之敌人随机移动+自动巡逻+障碍物跳跃+悬崖处转身+射线检测

总目录 9.Unity2D 简单AI 之敌人跳跃条件优化+自动范围内检测敌人发起攻击（索敌）+对象池优化+主角受伤死亡_ζั͡ ั͡雾 ั͡狼 ั͡✾的博客-CSDN博客Unit…

人工智能 2023年6月25日
0076
数据专家最常使用的 10 大类 Pandas 函数 ⛵

💡 作者：韩信子@ShowMeAI📘 数据分析实战系列：http://www.showmeai.tech/tutorials/40📘 本文地址：http://www.showmea…

人工智能 2023年6月4日
0091
动手实现深度学习(4): 神经网络的backward实现

在第二篇中介绍了用数值微分的形式计算神经网络的梯度，数值微分的形式比较简单也容易实现，但是计算上比较耗时。本章会介绍一种能够较为高效的计算出梯度的方法：基于图的误差反向传播。根据…

人工智能 2023年6月4日
0081
生成对抗网络(Generative Adversial Network,GAN)原理简介

生成对抗网络(GAN)是深度学习中一类比较大的家族，主要功能是实现图像、音乐或文本等生成(或者说是创作)，生成对抗网络的主要思想是：通过生成器(generator)与判别器(dis…

人工智能 2023年7月26日
0056
第十三周数据预处理实验

实验任务准备工作学习数据预处理描述性分析的内容在学习通下载文件”insurance.csv”和”doc_sim.csv” 实验…

人工智能 2023年7月18日
0055
sklearn输出模型参数_线性回归（模型的建立与求解）

0 前言在上一篇文章中笔者介绍了如何通过三个阶段来循序渐进的学习一个机器学习算法，那么下面就开讲解第一个算法：线性回归（Linear Regression）。如图所示为整个线性…

人工智能 2023年6月18日
0087
《联邦学习实战》：从零开始通过联邦学习实现图像分类

《联邦学习实战》：从零开始通过联邦学习实现图像分类最近需要学习联邦学习，参考《联邦学习实战》入门，本文为《联邦学习实战》第三章的笔记。可算跑起来了，在重点分析代码之前，因为太…

人工智能 2023年7月6日
00115
开集识别(Open Set Recognition, OSR)算法：《Towards Open Set Deep Networks》OpenMax

写完了论文，重新梳理一下，以后不再更新。 1.相关论文源码《Towards Open Set Deep Networks》即OpenMax源码:https://github.co…

人工智能 2023年6月17日
00106
锚框(anchor box)/先验框(prior bounding box)概念以及yolov3中的使用

1.概念关于先验框，有的paper(如Faster RCNN)中称之为anchor(锚点)，有的paper(如SSD)称之为prior bounding box(先验框)，实际上…

人工智能 2023年7月27日
0089
python数据相关性绘图-散点图正态分布图回归图等及鸢尾花数据集可视化（附Python代码）

背景描述数据分析中离不开对数据的相关性分析，并且需要把这些相关性进行可视化（绘图），以方便人们对各种特征属性之间呈现出来的相关性有更直接、清晰的感知和理解，提升数据的价值和数据挖…

人工智能 2023年6月16日
0077

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

python怎么写入聚类标签_如何从k均值聚类算法中生成聚类的输出标签？

大家都在看