多项式回归原理及手工实现

上一篇对于Y = 2 x 1 − 3 x 2 + 4 x 3 − 5 Y=2x_1-3x_2+4x_3-5 Y =2 x 1 ​−3 x 2 ​+4 x 3 ​−5函数进行了拟合,如果函数对于Y = 2 t 4 − 3 t 3 + 4 t 2 − 5 Y=2t^4-3t^3+4t^2-5 Y =2 t 4 −3 t 3 +4 t 2 −5该如何拟合?{ x 1 = t 4 x 2 = t 3 x 3 = t 2 \left{ \begin{aligned} x_1 & = & t^4 \ x_2 & = & t^3 \ x_3 & = & t^2 \end{aligned} \right.⎩⎪⎨⎪⎧​x 1 ​x 2 ​x 3 ​​===​t 4 t 3 t 2 ​ 做上述代换,即可得到Y = 2 x 1 − 3 x 2 + 4 x 3 − 5 Y=2x_1-3x_2+4x_3-5 Y =2 x 1 ​−3 x 2 ​+4 x 3 ​−5。由此将 一元(t t t)高次多项式转换成了 多元(x 1 x_1 x 1 ​ ,x 2 x_2 x 2 ​ ,x 3 x_3 x 3 ​)一次多项式。仍可使用线性回归模型来进行拟合。

假设我们有1000个样本的训练集,每个样本x ( i ) x^{(i)}x (i )是3维的,也就是有三个特征,x 1 ( i ) x^{(i)}_1 x 1 (i )​,x 2 ( i ) x^{(i)}_2 x 2 (i )​,x 3 ( i ) x^{(i)}_3 x 3 (i )​。有要拟合的直线:Y = w 1 x 1 + w 2 x 2 + w 3 x 3 + b = W T X + b Y=w_1x_1+w_2x_2+w_3x_3+b=W^\mathrm{T}X+b Y =w 1 ​x 1 ​+w 2 ​x 2 ​+w 3 ​x 3 ​+b =W T X +b假设Y = 2 x 1 − 3 x 2 + 4 x 3 − 5 Y=2x_1-3x_2+4x_3-5 Y =2 x 1 ​−3 x 2 ​+4 x 3 ​−5,如何使用线性回归模型和梯度下降法来不断迭代求得参数呢?

1、首先生成数据集

对于Y = 2 x 1 − 3 x 2 + 4 x 3 − 5 Y=2x_1-3x_2+4x_3-5 Y =2 x 1 ​−3 x 2 ​+4 x 3 ​−5,训练集中X ∈ R 1000 × 3 X\in{R}^{1000\times3}X ∈R 1 0 0 0 ×3,Y ∈ R 1000 × 1 Y\in{R}^{1000\times1}Y ∈R 1 0 0 0 ×1。

import torch
def generate_data(w, b, num_examples):
    """ Y = 2t^4 - 3t^3 + 4t^2 - 5 + noise """
    X = torch.normal(0, 1, (num_examples, len(w)))
    y = torch.matmul(X, w) + b
    y += torch.normal(0, 0.01, y.shape)
    return X, y.reshape((-1, 1))

true_w = torch.tensor([2., -3., 4.])
true_b = 5.

features, labels = generate_data(true_w, true_b, 1000)

2、小批量读取数据集

1000个数据中,batch_size=8,整除后共125组。batch_X[125,8,3],batch_y[125,8,1]。
这里先将1000个样本数据打乱,然后按照顺序、无放回地8个8个取样本组成一个batch。因此,在后面训练模型的时候,每一个epoch,都会按组(125组)来遍历,每次会遍历完所有组。

def data_iterater(X, y, batch_size):
    num = len(X)
    indices = list(range(num))
    random.shuffle(indices)
    batch_X = torch.zeros([num//batch_size, batch_size, len(true_w)])
    batch_y = torch.zeros([num//batch_size, batch_size, 1])
    for id, i in enumerate(range(0, num, batch_size)):
        batch_indices = torch.tensor(indices[i: min(i + batch_size, num)])
        batch_X[id,:,:] = features[batch_indices]
        batch_y[id,:,:] = labels[batch_indices]
    return batch_X, batch_y

batch_size = 8
batch_X, batch_y = data_iterater(features, labels, batch_size)

3、定义模型


def Linear_Model(X, w, b):
    return torch.matmul(X, w) + b

4、定义损失函数、优化方法


def loss_sq(predict, y):
    return (predict - y) ** 2 / 2

def sgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

5、训练

训练前初始化模型参数w , b w,b w ,b。以及相应超参数学习率l r lr l r,模型迭代次数e p o c h s epochs e p o c h s等等。

w = torch.normal(0, 0.01, size=(3,1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
lr = 1e-3
epochs = 100
for epoch in range(epochs):
    """在每个epoch中,小批量梯度遍历整个数据集,也就是每次batch_size大小取样本
    并将训练数据集中所有样本都使用一次(假设样本数能够被批量大小整除)。"""
    for (X, y) in zip(batch_X, batch_y):
        predict = Linear_Model(X, w, b)
        loss = loss_sq(predict, y)
        loss.sum().backward()
        sgd([w, b], lr, batch_size)
    with torch.no_grad():
        train_l = loss_sq(Linear_Model(features, w, b), labels)
        print(f'epoch {epoch + 1}, loss {float(train_l.mean()):f}')
print(f'true_w {true_w}, true_b {true_b}')
print(f'pred_w {w}, pred_b {b}')

6、结果

epoch 1, loss 21.021652
epoch 2, loss 16.366686
epoch 3, loss 12.744938
epoch 4, loss 9.926495
epoch 5, loss 7.732749
.......................

epoch 99, loss 0.000052
epoch 100, loss 0.000052
true_w tensor([ 2., -3.,  4.]), true_b 5.0
pred_w tensor([[ 2.0000],
        [-2.9998],
        [ 3.9995]], requires_grad=True), pred_b tensor([5.0002], requires_grad=True)

y = 5 + 1.2 x − 3.4 x 2 2 ! + 5.6 x 3 3 ! + n o i s e n o i s e ∈ N ( 0 , 0. 1 2 ) y=5+1.2x-3.4\frac{x^2}{2!}+5.6\frac{x^3}{3!}+noise\quad noise\in N(0,0.1^2)y =5 +1 .2 x −3 .4 2 !x 2 ​+5 .6 3 !x 3 ​+n o i s e n o i s e ∈N (0 ,0 .1 2 )
假设要拟合最高次项为x 3 x^3 x 3的多项式,开始并不知道用几次方拟合最合适,因此设置max-degree=20。做下式代换,转换为一个线性回归模型求解问题。{ x 0 = x 0 / 0 ! x 1 = x 1 / 1 ! x 2 = x 2 / 2 ! x 3 = x 3 / 3 ! \left{ \begin{aligned} x_0 & = &x^0/0! \ x_1 & = &x^1/1! \ x_2 & = &x^2/2! \ x_3 & = &x^3/3! \end{aligned} \right.⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪⎧​x 0 ​x 1 ​x 2 ​x 3 ​​====​x 0 /0 !x 1 /1 !x 2 /2 !x 3 /3 !​

1、生成数据集、划分数据集

设置max-degree=20,由于不知道使用3次多项式能得到最好的拟合结果,在生成真实数据前,w w w的维度是20初始值为0,只有x 0 x^0 x 0、x 1 x^1 x 1、x 2 x^2 x 2、x 3 x^3 x 3对应的系数为5、1.2、-3.4、5.6。假设数据集中共200条样本(训练集:100,测试集:100),features对应x的值维度(200,1),poly_features对应x多项式的值x n n ! \frac{x^n}{n!}n !x n ​,维度(200,20),权重矩阵维度(20,1),最终输出维度(200,1)。

import math
import torch
import random
import numpy as np
from torch import nn

max_degree = 20

def generate_data(w, num_examples):
    """ Y = 4a^2 -3b +2c^0.5 -1 + noise"""
    X = torch.normal(0, 1, (num_examples, 1))
    poly_features = np.power(X, np.arange(max_degree).reshape(1, -1))
    for i in range(max_degree):
        poly_features[:, i] /= math.gamma(i + 1)

    labels = torch.matmul(poly_features, true_w)
    labels += torch.normal(0, 0.01, labels.shape)
    return poly_features, labels

def data_iterater(X, y, batch_size, poly):
    num = len(X)
    indices = list(range(num))
    random.shuffle(indices)
    batch_X = torch.zeros([num//batch_size, batch_size, poly], dtype=torch.float64)
    batch_y = torch.zeros([num//batch_size, batch_size, 1], dtype=torch.float64)
    for id, i in enumerate(range(0, num, batch_size)):
        batch_indices = torch.tensor(indices[i: min(i + batch_size, num)])
        batch_X[id,:,:] = (X[batch_indices])[:,0:poly]
        batch_y[id,:,:] = y[batch_indices]
    return batch_X, batch_y

true_w = torch.zeros((20,1), dtype=torch.float64)
true_w[0:4,0] = torch.tensor([5., 1.2, -3.4, 5.6], dtype=torch.float64)
features, labels = generate_data(true_w,  1000)
test_features, test_labels = generate_data(true_w,  1000)

2、定义模型、损失函数、优化方法

def Linear_Model(X, w):
    return torch.matmul(X, w)

def loss_sq(predict, y):
    return (predict - y) ** 2 / 2

def sgd(params, lr, batch_size):
    with torch.no_grad():
        for param in params:
            param -= lr * param.grad / batch_size
            param.grad.zero_()

3、训练

poly = 2
batch_size = 10
batch_X, batch_y = data_iterater(features, labels, batch_size, poly)
w = torch.normal(0, 0.01, size=(poly, 1), requires_grad=True, dtype=torch.float64)
params = [w]
lr = 0.1

for epoch in range(10):
    """在每个epoch中,小批量梯度遍历整个数据集,也就是每次batch_size大小取样本
    并将训练数据集中所有样本都使用一次(假设样本数能够被批量大小整除)。"""
    for (X, y) in zip(batch_X, batch_y):
        predict = Linear_Model(X, w)
        loss = loss_sq(predict, y)
        loss.sum().backward()
        sgd(params, lr, batch_size)
    with torch.no_grad():
        train_l = loss_sq(Linear_Model(features[:,0:poly], w), labels)
        test_l = loss_sq(Linear_Model(test_features[:, 0:poly], w), test_labels)
        print(f'epoch {epoch + 1}, train_loss {float(train_l.mean()):f}, test_loss {float(test_l.mean()):f}')
print(f'poly is {poly}')
print(f'true_w {true_w[0:poly]}')
print(f'pred_w {w}')

4、修改超参数poly=2、4、20查看结果

poly =2 时欠拟合

epoch 1, train_loss 8.802216, test_loss 5.805620
epoch 2, train_loss 8.802248, test_loss 5.805681
epoch 3, train_loss 8.802248, test_loss 5.805681
epoch 4, train_loss 8.802248, test_loss 5.805681
epoch 5, train_loss 8.802248, test_loss 5.805681
epoch 6, train_loss 8.802248, test_loss 5.805681
epoch 7, train_loss 8.802248, test_loss 5.805681
epoch 8, train_loss 8.802248, test_loss 5.805681
epoch 9, train_loss 8.802248, test_loss 5.805681
epoch 10, train_loss 8.802248, test_loss 5.805681
poly is 2
true_w tensor([[5.0000],
        [1.2000]], dtype=torch.float64)
pred_w tensor([[3.2970],
        [4.8789]], dtype=torch.float64, requires_grad=True)

poly=4loss最低,结果较好

epoch 1, train_loss 0.004260, test_loss 0.003319
epoch 2, train_loss 0.000179, test_loss 0.000157
epoch 3, train_loss 0.000055, test_loss 0.000056
epoch 4, train_loss 0.000050, test_loss 0.000052
epoch 5, train_loss 0.000049, test_loss 0.000051
epoch 6, train_loss 0.000049, test_loss 0.000051
epoch 7, train_loss 0.000049, test_loss 0.000051
epoch 8, train_loss 0.000049, test_loss 0.000051
epoch 9, train_loss 0.000049, test_loss 0.000051
epoch 10, train_loss 0.000049, test_loss 0.000051
poly is 4
true_w tensor([[ 5.0000],
        [ 1.2000],
        [-3.4000],
        [ 5.6000]], dtype=torch.float64)
pred_w tensor([[ 5.0013],
        [ 1.2005],
        [-3.3989],
        [ 5.6006]], dtype=torch.float64, requires_grad=True)

poly=20,拟合结果不如poly=4

epoch 1, train_loss 0.124580, test_loss 0.101282
epoch 2, train_loss 0.025603, test_loss 0.021297
epoch 3, train_loss 0.017807, test_loss 0.013880
epoch 4, train_loss 0.013682, test_loss 0.010144
epoch 5, train_loss 0.011170, test_loss 0.007961
epoch 6, train_loss 0.009459, test_loss 0.006544
epoch 7, train_loss 0.008171, test_loss 0.005534
epoch 8, train_loss 0.007135, test_loss 0.004761
epoch 9, train_loss 0.006267, test_loss 0.004141
epoch 10, train_loss 0.005523, test_loss 0.003625
poly is 20
true_w tensor([[ 5.0000],
        [ 1.2000],
        [-3.4000],
        [ 5.6000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000],
        [ 0.0000]], dtype=torch.float64)
pred_w tensor([[ 4.9736e+00],
        [ 1.3664e+00],
        [-3.3188e+00],
        [ 4.9433e+00],
        [-1.5672e-01],
        [ 1.1981e+00],
        [ 3.7415e-01],
        [ 9.2125e-02],
        [ 1.0790e-01],
        [ 1.9839e-03],
        [ 1.9713e-02],
        [-4.7299e-05],
        [-4.6686e-03],
        [ 4.3889e-04],
        [-1.2832e-02],
        [ 1.0800e-02],
        [-3.3636e-03],
        [ 2.1030e-03],
        [-2.8149e-03],
        [-1.4131e-03]], dtype=torch.float64, requires_grad=True)

Original: https://blog.csdn.net/jump882/article/details/119645863
Author: PuJiang-
Title: 多项式回归原理及手工实现

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/635434/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球