聚类–DBSCAN算法

模型:
sklearn.cluster.DBSCAN(eps=0.5, min_samples=5, metric=’euclidean’, metric_params=None, algorithm=’auto’, leaf_size=30, p=None, n_jobs=None)

首先看下数据集分布是什么样的


import numpy as np
import matplotlib.pyplot as plt
import scipy.io as sio
from sklearn.cluster import DBSCAN
%matplotlib notebook
path='D:\code\python\database\ex7data2'
data=sio.loadmat(path)
X=data['X']
plt.scatter(X[:,0],X[:,1])
<ipython.core.display.javascript object>
</ipython.core.display.javascript>
<matplotlib.collections.pathcollection at 0x25398236ac8>
</matplotlib.collections.pathcollection>

利用sklearn模型聚类

model = DBSCAN(0.5,4)
model.fit(X)
print(set(model.labels_))
plt.scatter(X[:,0],X[:,1],c=model.labels_,cmap='rainbow')
{0, 1, 2, -1}

<ipython.core.display.javascript object>
</ipython.core.display.javascript>
<matplotlib.collections.pathcollection at 0x25398414e10>
</matplotlib.collections.pathcollection>

结果显示聚类簇数为3,蓝色点为噪声点

class DBSCAN_my:
    def __init__(self,eps,min_samples):
        self.eps = eps
        self.min_samples = min_samples

    def calCoreSamples(self,X):
        m = X.shape[0]
        core_samples = {}
        for i in range(m):
            samples = []
            count = 0
            for j in range(m):
                dist = np.sqrt(np.sum((X[i]-X[j])**2))
                if dist < self.eps:
                    samples.append(j)
                    count += 1
            if count > self.min_samples:
                samples.remove(i)
                core_samples[i] = samples
        return core_samples

    def cluster(self,value,core_samples,k):
        for i in value:
            if self.labels[i]==-1:
                self.labels[i] = k
                if i in core_samples:
                    self.cluster(core_samples[i],core_samples,k)

    def fit(self,X):

        core_samples = self.calCoreSamples(X)

        self.labels = -np.ones(X.shape[0])
        k = 0

        for key,value in core_samples.items():
            if self.labels[key] == -1:

                self.labels[key] = k

                self.cluster(value,core_samples,k)
                k += 1

model = DBSCAN_my(0.5,4)
model.fit(X)
print(set(model.labels))
plt.scatter(X[:,0],X[:,1],c=model.labels,cmap='rainbow')
{0.0, 1.0, 2.0, -1.0}

<ipython.core.display.javascript object>
</ipython.core.display.javascript>
<matplotlib.collections.pathcollection at 0x253985be940>
</matplotlib.collections.pathcollection>

Original: https://blog.csdn.net/qq_45420034/article/details/123017968
Author: Let it go !
Title: 聚类–DBSCAN算法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/550332/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球