# logistic回归

## logistic分布

$$F(x)=P(x \le x)=\frac 1 {1+e{-(x-\mu)/\gamma}}\ f(x)=F{‘}(x)=\frac {e^{-(x-\mu)/\gamma}} {\gamma(1+e{-(x-\mu)/\gamma})2}$$

$f(x)$与$F(x)$图像如下，其中分布函数是以$(\mu, \frac 1 2)$为中心对阵，$\gamma$越小曲线变化越快。

## logistic回归模型

$$P(Y=1|x)=\frac {exp(w \cdot x + b)} {1 + exp(w \cdot x + b)} \ P(Y=0|x)=\frac {1} {1 + exp(w \cdot x + b)}$$

### 参数估计

$$P(Y=1|x)=\pi (x), \quad P(Y=0|x)=1-\pi (x)$$

$$\prod_{i=1}^N [\pi (x_i)]^{y_i} [1 – \pi(x_i)]^{1-y_i}$$

$$L(w) = \sum_{i=1}^N [y_i \log{\pi(x_i)} + (1-y_i) \log{(1 – \pi(x_i)})]\ =\sum_{i=1}^N [y_i \log{\frac {\pi (x_i)} {1 – \pi(x_i)}} + \log{(1 – \pi(x_i)})]$$

## 梯度上升确定回归系数

logistic回归的sigmoid函数：

$$\sigma (z) = \frac 1 {1 + e^{-z}}$$

$$y = w_0 + w_1 x_1 + w_2 x_2 + … + w_n x_n$$

$$y = w_0 + w^T x$$

$$w = w + \alpha \nabla_w f(w)$$

### 训练算法

• 每个回归系数初始化为1[en]* each regression coefficient is initialized to 1
• 重复N次[en]* repeat N times
• 计算整个数据集合的梯度
• 使用 $\alpha \cdot \nabla f(x)$ 来更新w向量
• 返回回归系数
#!/usr/bin/env python
# encoding:utf-8

import math
import numpy
import time
import matplotlib.pyplot as plt

def sigmoid(x):
return 1.0 / (1 + numpy.exp(-x))

dataMat = []
laberMat = []
with open("test.txt", 'r') as f:
arry = line.strip().split()
dataMat.append([1.0, float(arry[0]), float(arry[1])])
laberMat.append(float(arry[2]))
return numpy.mat(dataMat), numpy.mat(laberMat).transpose()

start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for i in range(maxCycle):
h = sigmoid(dataMat * weights)
error = laberMat - h
weights += alpha * dataMat.transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights

start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for i in range(m):
h = sigmoid(dataMat[i] * weights)
error = laberMat[i] - h
weights += alpha * dataMat[i].transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights

"""better one, use a dynamic alpha"""
start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for j in range(numIter):
for i in range(m):
alpha = 4 / (1 + j + i) + 0.01
h = sigmoid(dataMat[i] * weights)
error = laberMat[i] - h
weights += alpha * dataMat[i].transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights
start_time = time.time()

def show(dataMat, laberMat, weights):
m, n = numpy.shape(dataMat)
min_x = min(dataMat[:, 1])[0, 0]
max_x = max(dataMat[:, 1])[0, 0]
xcoord1 = []; ycoord1 = []
xcoord2 = []; ycoord2 = []
for i in range(m):
if int(laberMat[i, 0]) == 0:
xcoord1.append(dataMat[i, 1]); ycoord1.append(dataMat[i, 2])
elif int(laberMat[i, 0]) == 1:
xcoord2.append(dataMat[i, 1]); ycoord2.append(dataMat[i, 2])
fig = plt.figure()
ax.scatter(xcoord1, ycoord1, s=30, c="red", marker="s")
ax.scatter(xcoord2, ycoord2, s=30, c="green")
x = numpy.arange(min_x, max_x, 0.1)
y = (-weights[0] - weights[1]*x) / weights[2]
ax.plot(x, y)
plt.xlabel("x1"); plt.ylabel("x2")
plt.show()

if __name__ == "__main__":
show(dataMat, laberMat, weights)


Original: https://www.cnblogs.com/coder2012/p/4598913.html
Author: cococo点点
Title: logistic回归

(0)