K-MEANS算法

python实例:

import pandas as pd

df = pd.read_csv('./datasets/ch1ex1.csv')
points = df.values
df.head()

K-MEANS算法
#散点图观察
import matplotlib.pyplot as plt
xs = points[:,0]
ys = points[:,1]
plt.scatter(xs, ys)
plt.show()

K-MEANS算法
#聚类操作
df = pd.read_csv('./datasets/ch1ex1.csv')
points = df.values

from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(points)
labels = model.predict(points)
labels
array([2, 0, 1, 1, 0, 0, 1, 2, 0, 0, 1, 2, 0, 1, 0, 2, 1, 1, 2, 1, 0, 2, 0,
       2, 2, 0, 2, 2, 2, 0, 1, 1, 1, 0, 2, 0, 2, 2, 0, 2, 2, 1, 0, 0, 0, 2,
       2, 1, 2, 1, 1, 1, 2, 2, 2, 0, 2, 2, 0, 1, 0, 2, 2, 1, 1, 0, 1, 0, 0,
       2, 1, 0, 1, 2, 1, 0, 2, 2, 2, 1, 2, 0, 1, 0, 0, 0, 0, 2, 2, 1, 0, 1,
       0, 2, 2, 2, 1, 0, 0, 1, 0, 2, 0, 1, 2, 1, 1, 1, 0, 0, 2, 0, 1, 0, 0,
       0, 2, 0, 1, 1, 2, 2, 2, 2, 2, 0, 1, 2, 0, 0, 1, 1, 0, 2, 0, 2, 1, 0,
       1, 2, 1, 1, 2, 1, 1, 2, 1, 0, 2, 2, 2, 1, 1, 0, 1, 0, 2, 2, 1, 0, 1,
       1, 1, 0, 2, 2, 0, 1, 1, 2, 2, 1, 2, 2, 0, 2, 1, 1, 1, 2, 2, 1, 2, 1,
       1, 2, 0, 1, 2, 2, 2, 2, 0, 1, 2, 0, 0, 0, 2, 0, 2, 2, 0, 1, 1, 2, 1,
       2, 2, 0, 0, 2, 1, 0, 1, 2, 1, 0, 2, 0, 0, 0, 0, 1, 1, 1, 2, 2, 0, 2,
       1, 0, 2, 2, 0, 2, 1, 1, 1, 1, 1, 0, 2, 2, 1, 1, 2, 0, 1, 0, 0, 2, 2,
       0, 0, 0, 2, 1, 2, 0, 2, 1, 1, 1, 1, 1, 2, 2, 0, 2, 2, 0, 1, 1, 0, 2,
       1, 1, 0, 0, 2, 2, 2, 0, 0, 2, 1, 0, 0, 1, 2, 2, 2, 0, 2, 2, 2, 0, 0,
       0])
#聚类中心
centroids = model.cluster_centers_
centroids_x = centroids[:,0]
centroids_y = centroids[:,1]

plt.scatter(xs, ys, c=labels)
plt.scatter(centroids_x, centroids_y, marker='X', s=200)
plt.show()

K-MEANS算法

K值对结果的影响

import pandas as pd

seeds_df = pd.read_csv('./datasets/seeds.csv')

varieties = list(seeds_df['grain_variety'])

del seeds_df['grain_variety']

seeds_df.head()

K-MEANS算法
samples = seeds_df.values

from sklearn.cluster import KMeans

ks = range(1, 6)
inertias = []

for k in ks:
    # Create a KMeans instance with k clusters: model
    model = KMeans(n_clusters=k)

    # Fit model to samples
    model.fit(samples)

    # Append the inertia to the list of inertias
    inertias.append(model.inertia_)
import matplotlib.pyplot as plt

Plot ks vs inertias
plt.plot(ks, inertias, '-o')
plt.xlabel('number of clusters, k')
plt.ylabel('inertia')
plt.xticks(ks)
plt.show()

K-MEANS算法

K-means的结果带有一定的随机性

model = KMeans(n_clusters=3)
labels = model.fit_predict(samples)
df = pd.DataFrame({'labels': labels, 'varieties': varieties})
ct = pd.crosstab(df['labels'], df['varieties'])
ct

K-MEANS算法

make_pipeline更方便

import pandas as pd

df = pd.read_csv('./datasets/fish.csv')

species = list(df['species'])

del df['species']

df.head()

K-MEANS算法
samples = df.values
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
scaler = StandardScaler()
kmeans = KMeans(n_clusters=4)

pipeline = make_pipeline(scaler, kmeans)
pipeline.fit(samples)

labels = pipeline.predict(samples)
df = pd.DataFrame({'labels': labels, 'species': species})
ct = pd.crosstab(df['labels'], df['species'])
ct

K-MEANS算法

Original: https://blog.csdn.net/weixin_53660567/article/details/122963458
Author: 长沙有肥鱼
Title: K-MEANS算法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/550430/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球