k近邻法–python代码实现与kd树构建搜索

k近邻法属于监督学习,不需要训练模型(懒惰学习)。算法流程:对于测试样本,按照某种距离度量(闵可夫斯基距离、欧氏距离、曼哈顿距离、切比雪夫距离等)从给定的训练集中找出与其相近邻的k个训练样本,根据这k个训练样本信息进行预测。对于分类任务,可按照投票法,以k个训练样本中出现最多的类别作为预测结果;对于回归任务,可按照平均法,取k个训练样本实际输出的平均值作为预测结果,也可以加权平均,对于距离测试样本进的训练样本取较大的权重。

对鸢尾花数据集进行分类,将数据集分为训练集和测试集

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
from collections import Counter
iris_data = load_iris()
data = pd.DataFrame(iris_data.data,columns=iris_data.feature_names)
data['target'] = iris_data.target
X,y = data.iloc[:,:-1].values,data.iloc[:,-1].values
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,stratify=y)
data.head()

sepal length (cm)sepal width (cm)petal length (cm)petal width (cm)target05.13.51.40.2014.93.01.40.2024.73.21.30.2034.63.11.50.2045.03.61.40.20

构建模型

class knn:
    def __init__(self,X_train,y_train,neighbors,ord):
        self.X = X_train
        self.y = y_train
        self.k = neighbors
        self.ord = ord
    def predict(self,x):
        knn_list = []
        for i in range(self.k):
            dist = np.linalg.norm(x-self.X[i],ord=self.ord)
            knn_list.append((dist,self.y[i]))

        for i in range(self.k,len(self.X)):
            dist = np.linalg.norm(x-self.X[i],ord=self.ord)
            max_index = knn_list.index(max(knn_list,key=lambda x:x[0]))
            if dist<knn_list[max_index][0]:
                knn_list[max_index] = (dist,self.y[i])

        knn_y = [k[-1] for k in knn_list]

        counter_y = Counter(knn_y)
        max_y = sorted(counter_y.items(),key=lambda x:x[1])[-1][0]
        return max_y

    def score(self,X_test,y_test):
        count = 0
        for x,y in zip(X_test,y_test):
            if y==self.predict(x):
                count += 1
        return count/len(X_test)
knn_model = knn(X_train,y_train,15,2)
test_accuracy = knn_model.score(X_test,y_test)
print('测试集准确率为;',test_accuracy)
&#x6D4B;&#x8BD5;&#x96C6;&#x51C6;&#x786E;&#x7387;&#x4E3A;; 0.9777777777777777

sklearn实现

from sklearn.neighbors import KNeighborsClassifier
knn_model = KNeighborsClassifier(n_neighbors=10)
knn_model.fit(X_train,y_train)
score = knn_model.score(X_test,y_test)
print(score)
0.977777777778

kd树是二叉树,表示对k维空间的划分,用来对k维的实例点进行存储并进行快速检索,其结构非常适合寻找最近邻居和碰撞检测。
构建过程:
首先,构建根节点。按照实例点的某一维度进行划分,一般选取方差值最大的维度进行划分;将实例点按照该维度的值进行排序,取中值对应的实例点作为根节点,将值小于中值的实例点放在树的左边,生成左子节点,大于中值的实例点放在树的右边,生成右子节点;然后对左子节点和右子节点继续进行维度划分;重复上述过程,直到不能划分为止,生成叶子节点。
搜索过程:
首先,在kd树中找到包含目标点x的叶子节点,从根节点出发,递归访问kd树,若x小于当前节点划分维度的值,则移动到左子节点,否则反之;以寻找到的叶节点作为当前最近点,向上依次访问当前最近点的父节点并更新最近点,若父节点在划分维度与目标点x的值小于当前最近距离,还需要访问此父节点的另一子节点;直到访问到根节点结束。
若训练集为[[7,2],[4,7],[5,4],[2,3],[8,1],[9,6]],寻找目标点[3,4.5]的最近训练样本。

from collections import namedtuple
import numpy as np
from time import process_time

class KdNode(object):
    def __init__(self,dom_elt,split,left,right):
        self.dom_elt = dom_elt
        self.split = split
        self.left = left
        self.right = right

class KdTree(object):
    def __init__(self,data):
        self.k = len(data[0])
        self.root = self.createNode(data,0)

    def createNode(self,data,split):
        if not data:
            return None
        splitPos = len(data)>>2
        dataSort = sorted(data,key=lambda x:x[split])
        medianNode = dataSort[splitPos]
        splitNext = (split+1) % self.k
        return KdNode(medianNode,split,self.createNode(dataSort[:splitPos],splitNext),self.createNode(dataSort[splitPos+1:],splitNext))

    def search_nearest(self,point):
        self.result = namedtuple('nearestInf','point distance')
        self.nearestPoint = None
        self.nearestDis = 0
        def travel(node,depth):
            if node != None:
                axis = node.split
                if point[axis]<node.dom_elt[axis]:
                    travel(node.left,depth+1)
                else:
                    travel(node.right,depth+1)
                distance = np.sqrt(np.sum((np.array(point) - np.array(node.dom_elt))**2))
                if self.nearestPoint == None:
                    self.nearestPoint = node.dom_elt
                    self.nearestDis = distance
                elif self.nearestDis > distance:
                    self.nearestPoint = node.dom_elt
                    self.nearestDis = distance
                if abs(point[axis]-node.dom_elt[axis]) < self.nearestDis:
                    if point[axis] < node.dom_elt[axis]:
                        travel(node.right, depth + 1)
                    else:
                        travel(node.left, depth + 1)
        travel(self.root,0)
        return self.result(self.nearestPoint,self.nearestDis)

data = [[7,2],[4,7],[5,4],[2,3],[8,1],[9,6]]
tree1 = KdTree(data)
result = tree1.search_nearest([3,4.5])
print(result)

nearestInf(point=[2, 3], distance=1.8027756377319946)
0.0

结果为点(2,3),距离为1.803

Original: https://blog.csdn.net/qq_45420034/article/details/123022346
Author: Let it go !
Title: k近邻法–python代码实现与kd树构建搜索

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/663589/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球