机器学习学习笔记之一:K最近邻算法(KNN)

假定数据有M个特征,则这些数据相当于在M维空间内的点

[X = \begin{pmatrix} x_{11} & x_{12} & … & x_{1M} \ x_{21} & x_{22} & … & x_{2M} \ . & . & & .\ . & . & & .\ . & . & & .\ x_{N1} & x_{N2} & … & x_{NM} \end{pmatrix}]

同时我们有标注集向量

[\vec{y} = \begin{pmatrix} y_1 \ y_2 \ . \ . \ . \ y_M \end{pmatrix}]

那么对于一个新的数据点

[\vec{x_z} = \begin{pmatrix} x_{z1} & x_{z2} & … & x_{zM} \end{pmatrix}]

我们通过计算其与其他所有点的欧氏距离

[D_j=\sqrt{(x_{z1}-x_{j1})^2+(x_{z2}-x_{j2})^2+…+(x_{zM}-x_{jM})^2} ]

得到与所有点的距离向量(并按从小到大排序)

[\vec{D} = \begin{pmatrix} D_1 \ D_2 \ . \ . \ . \ D_M \end{pmatrix}]

取前k个点即为最近邻的k个点。

[\vec{D_k} = \begin{pmatrix} D_1 \ D_2 \ . \ . \ . \ D_k \end{pmatrix}]

根据这k个点所对应的标注,统计这些标注出现的次数(n_k)

[\vec{y’}=\begin{pmatrix} y_1 & n_1 \ y_2 & n_2 \ . & .\ . & .\ . & .\ y_k & n_k \end{pmatrix}]

取数量最大的标注作为(\vec{x_z})的标注。

[y_z = \max_n{\vec{y’}} ]

算法实现(Python)

from numpy import *

def KNNclassify(inX, dataset, labels, k):
"""
    K-Nearest Neighbour algorithm
    :param inX: Input vector X
    :param dataset: Training Dataset
    :param labels: Labels vector
    :param k: the number of nearest neighbours
    :return: The class of input
"""
    dataset_size = dataset.shape[0]
    diffMat = tile(inX, (dataset_size, 1)) - dataset  # Use inX to fill a matrix of dataset_size
    sqDiffMat = diffMat**2
    sqDistances = sqDiffMat.sum(axis=1)  # Sum according to rows of matrix
    distances = sqDistances**0.5
    sortedDistIndicies = distances.argsort()  # Get the index of all distances
    classCount = {}
    for i in range(k):
        voteIlabel = labels[sortedDistIndicies[i]]
        classCount[voteIlabel] = classCount.get(voteIlabel, 0) + 1
    sortedClassCount = sorted(classCount.iteritems(), key=operator.itemgetter(1), reverse=True)
    return sortedClassCount[0][0]

算法优点

算法缺点

Original: https://www.cnblogs.com/ryuasuka/p/7368078.html
Author: 飞鸟_Asuka
Title: 机器学习学习笔记之一:K最近邻算法(KNN)

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/610472/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球