K-近邻算法(KNN)
一个样本在特征空间中的k个最近的样本中的大多数属于某一个类别,则该样本也属于这个类别
- 距离
- 欧氏距离
d = ( x 1 − x 2 ) 2 + ( y 1 + y 2 ) 2 d=\sqrt{(x_1-x_2)^2+(y_1+y_2)^2}d =(x 1 −x 2 )2 +(y 1 +y 2 )2 - 曼哈顿距离
d = ∣ x 1 − x 2 ∣ + ∣ y 1 − y 2 ∣ d=|x_1-x_2|+|y_1-y_2|d =∣x 1 −x 2 ∣+∣y 1 −y 2 ∣ - 明科夫斯基距离
d = ( ∑ i = 1 n ∣ x i − y i ∣ p ) 1 / p d=(\sum_{i=1}^{n}{|x_i-y_i|^p})^{1/p}d =(i =1 ∑n ∣x i −y i ∣p )1 /p- p=2时是欧几里得距离
- p=1时是曼哈顿距离
- 算法分析
- k值过大时,容易收到样本不均衡的影响
- k值过小,容易受异常值影响
- 优点:
- 简单,易于理解,易于实现,无需训练
- 缺点
- 对测试样本分类时计算量很大,内存开销大
- 必须指定K值,K值选择不当则分类精度不能保证
- 使用场景:小规模数据
代码示例: 鸢尾花种类预测
from sklearn.datasets import load_iris
iris =load_iris()
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test =train_test_split(iris.data,iris.target,test_size=0.2,random_state=6)
from sklearn.preprocessing import StandardScaler
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
from sklearn.neighbors import KNeighborsClassifier
estimator = KNeighborsClassifier(n_neighbors=3)
estimator.fit(x_train,y_train)
y_predict = estimator.predict(x_test)
score = estimator.score(x_test,y_test)
print("score=",score)
Original: https://blog.csdn.net/qq_26192391/article/details/128328564
Author: console
Title: 机器学习之分类-K-近邻算法(KNN)
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/770109/
转载文章受原作者版权保护。转载请注明原作者出处!