一. 原理简单介绍
logistic回归是一种基于线性回归模型的分类算法,常用于数据挖掘,疾病自动诊断,经济预测等领域。例如,探讨引发疾病的危险因素,并根据危险因素预测疾病发生的概率等。以胃癌病情分析为例,选择两组人群,一组是胃癌组,一组是非胃癌组,两组人群必定具有不同的体征与生活方式等。因此因变量就为是否胃癌,值为”是”或”否”,自变量就可以包括很多了,如年龄、性别、饮食习惯、幽门螺杆菌感染等。自变量既可以是连续的,也可以是分类的。然后通过logistic回归分析,可以得到自变量的权重,从而可以大致了解到底哪些因素是胃癌的危险因素。同时根据该权值可以根据危险因素预测一个人患癌症的可能性。
那么为什么线性回归可以实现分类呢?
这里引进一个sigmoid函数
该函数图像如下:
我们令z = w 1 x 1 + w 2 x 2 + . . . w n x n + b z = w_{1}x_{1}+ w_{2}x_{2}+ … w_{n}x_{n}+b z =w 1 x 1 +w 2 x 2 +…w n x n +b,我们规定最后若输出g(z)>0.5,则记为1。输出g(z)
; 二.基于numpy的算法实现
①定义sigmoid函数和参数初始化函数
def sigmoid(x):
z = 1/(1+np.exp(-x))
return z
def initialize_params(dims):
w = np.zeros((dims,1))
b = 0
return w,b
②定义逻辑回归模型主体
def logistic(X,y,w,b):
num_train = X.shape[0]
num_features = X.shape[1]
y_hat = sigmoid(np.dot(X,w)+b)
loss = -1/num_train*np.sum(y*np.log(y_hat)+(1-y)*np.log(1-y_hat))
dw = np.dot(X.T,(y_hat-y))/num_train
db = np.sum(y_hat-y)/num_train
loss = np.squeeze(loss)
return y_hat,loss,dw,db
③定义模型训练过程(梯度下降)
def logistic_train(X,y,learning_rate,epochs):
w,b = initialize_params((X.shape[1]))
loss_list = []
for i in range(epochs):
y_hat,loss,dw,db = logistic(X,y,w,b)
w = w-learning_rate*dw
b = b-learning_rate*db
if i%100 == 0:
loss_list.append(loss)
print('epoch %d loss %f '%(i,loss))
params = {'w':w,'b':b}
grads = {'dw':dw,'db':db}
return loss_list,params,grads
④定义预测函数
def predict(X,params):
predict = sigmoid(np.dot(X.params['w'])+params['b'])
for i in range(len(predict)):
if predict[i] > 0.5:
predict[i] = 1
else:
predict[i] = 0
return predict
⑤ 生成模拟二分类数据集
import numpy as np
from sklearn.datasets import make_classification
X,labels = make_classification(n_samples=100,
n_features=2,
n_redundant=0,
n_informative=2,
random_state=1,
n_clusters_per_class=2)
rng = np.random.RandomState(2)
X += 2*rng.uniform(size=X.shape)
此处可用其他分类数据集代替。
⑥划分训练集和测试集
offset = int(X.shape[0]*0.8)
X_train,y_train = X[:offset],labels[:offset]
X_test,y_test = X[offset:],labels[offset:]
y_train = y_train.reshape((-1,1))
y_test = y_test.reshape((-1,1))
⑦模型训练和预测
loss_list,params,grads = logistic_train(X_train,y_train,0.01,1000)
print(params)
y_pred = predict(X_test,params)
print(y_pred)
⑧模型效果评估
from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))
三.基于sklearn的算法实现
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
clf = LogisticRegression(random_state=0).fit(X_train,y_train)
y_pred = clf.predict(X_test)
print(y_pred)
print(classification_report(y_test,y_pred))
Original: https://blog.csdn.net/weixin_46943790/article/details/122806885
Author: 阳阳养羊羊
Title: 逻辑(logistic)回归算法原理及两种代码实现
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/626873/
转载文章受原作者版权保护。转载请注明原作者出处!