回归

回归

更好的阅读,请访问littlefish.top
回归的目的是预测数值型目标值。类似于$y = w_1 \cdot x_1 + w_2 \cdot x_2$,其中w称为回归系数,只要可以确定w,就可以通过输入x得到预测值。

平方误差确定回归系数

假设输入为x,输出为y,则平方误差可表示为:[en]Assuming that the input is x and the output is y, the square error can be expressed as:

$$\sum_{i=1}^m (y_i – x_i^T w)^2)$$

为了使平方误差最小,并使导数为0来获得最佳回归系数,则[en]In order to minimize the square error and make the derivative 0 to obtain the best regression coefficient, then

$$w = (X^T X){-1}XTy$$

该算法的实现如下:[en]The algorithm is implemented as follows:

from numpy import *
import matplotlib.pyplot as plt

def loadDataSet(fileName):
    numFeat = len(open(fileName).readline().split('\t')) - 1
    dataMat = []; labelMat = []
    fr = open(fileName)
    for line in fr.readlines():
        lineArr =[]
        curLine = line.strip().split('\t')
        for i in range(numFeat):
            lineArr.append(float(curLine[i]))
        dataMat.append(lineArr)
        labelMat.append(float(curLine[-1]))
    return dataMat,labelMat

def standRegres(xArr,yArr):
    xMat = mat(xArr); yMat = mat(yArr).T
    xTx = xMat.T*xMat
    if linalg.det(xTx) == 0.0:
        print "This matrix is singular, cannot do inverse"
        return
    ws = xTx.I * (xMat.T*yMat)
    return ws

def regression1():
    xArr, yArr = loadDataSet("Ch08/ex0.txt")
    xMat = mat(xArr)
    yMat = mat(yArr)
    ws = standRegres(xArr, yArr)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    print xMat[:, 1].flatten()
    print yMat.T[:, 0].flatten()
    ax.scatter(xMat[:, 1].flatten(), yMat.T[:, 0].flatten().A[0])
    xCopy = xMat.copy()
    xCopy.sort(0)
    yHat = xCopy * ws
    ax.plot(xCopy[:, 1], yHat)
    plt.show()

if __name__ == "__main__":
    regression1()

结果如下:[en]The results are as follows:

回归

局部加权线性回归

最简单的线性回归(局部加权线性回归)对最小均方误差的估计是无偏的,因此会出现欠拟合。预测结果可通过局部加权线性回归进行优化,局部加权回归系数w为:[en]The simplest linear regression (locally weighted linear regression) has an unbiased estimation of the minimum mean square error, so underfitting occurs. The prediction results can be optimized by locally weighted linear regression, and the locally weighted regression coefficient w is as follows:

$$w = (X^T WX){-1}XTWy$$

其中,W是类似于调整不同权重的“芯”。最常用的核是高斯核,如下所示:[en]Among them, W is similar to the “core” to adjust the weight of different weights. The most commonly used kernel is the Gaussian kernel, as follows:

$$w(i, j) = exp (\frac {|x^{(i)} – x|} {-2k^2})$$

其中,k会对权重产生影响,k越小,权重变化越快。[en]Among them, k will have an impact on the weight, the smaller k, the faster the weight change.

算法实现

通过核函数对权值进行调整,可以赋予邻近点更高的权值。[en]By adjusting the weight of the weight through the kernel function, the nearby points can be given a higher weight.

def lwlr(testPoint,xArr,yArr,k=1.0):
    xMat = mat(xArr); yMat = mat(yArr).T
    m = shape(xMat)[0]
    weights = mat(eye((m)))
    for j in range(m):                      #next 2 lines create weights matrix
        diffMat = testPoint - xMat[j,:]     #
        weights[j,j] = exp(diffMat*diffMat.T/(-2.0*k**2))
    xTx = xMat.T * (weights * xMat)
    if linalg.det(xTx) == 0.0:
        print "This matrix is singular, cannot do inverse"
        return
    ws = xTx.I * (xMat.T * (weights * yMat))
    return testPoint * ws

def lwlrTest(testArr,xArr,yArr,k=1.0):  #loops over all the data points and applies lwlr to each one
    m = shape(testArr)[0]
    yHat = zeros(m)
    for i in range(m):
        yHat[i] = lwlr(testArr[i],xArr,yArr,k)
    return yHat

def regression2():
    xArr, yArr = loadDataSet("Ch08/ex0.txt")
    yhat = lwlrTest(xArr, xArr, yArr, 0.01)
    fig = plt.figure()
    ax = fig.add_subplot(111)
    xMat = mat(xArr)
    srtInd = xMat[:, 1].argsort(0)
    xSort = xMat[srtInd][:, 0, :]
    ax.plot(xSort[:, 1], yhat[srtInd])
    ax.scatter(xMat[:, -1].flatten(), mat(yArr).T.flatten().A[0], s=2, c="red")
    plt.show()

结果如下:[en]The results are as follows:

回归

回归

因此,如果k值较小,就会考虑太多的噪声,通过选择合适的k值可以获得最优的结果。[en]Therefore, if the k value is smaller, too much noise will be considered, and the optimal result can be obtained by choosing the appropriate k value.

Original: https://www.cnblogs.com/coder2012/p/4601295.html
Author: cococo点点
Title: 回归

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/6044/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

免费咨询
免费咨询
扫码关注
扫码关注
联系站长

站长Johngo!

大数据和算法重度研究者!

持续产出大数据、算法、LeetCode干货,以及业界好资源!

2022012703491714

微信来撩,免费咨询:xiaozhu_tec

分享本页
返回顶部