【Machine Learning】5.特征工程和多项式回归

2023年6月17日下午2:08 • 人工智能 • 阅读 89

特征工程和多项式回归

1. 导入
2.多项式特征
3.特征选择
4.多项式特征与线性特征的关联
5. 特征缩放 Scaling features
6.复杂函数的拟合
7.课后题

特征工程，使用线性回归机制来拟合非常复杂甚至非线性(存在x n x^n x n)的函数。

导入

import numpy as np
import matplotlib.pyplot as plt
from lab_utils_multi import zscore_normalize_features, run_gradient_descent_feng
np.set_printoptions(precision=2)

2.多项式特征

这是线性回归时使用的
f w , b = w 0 x 0 + w 1 x 1 + . . . + w n − 1 x n − 1 + b (1) f_{\mathbf{w},b} = w_0x_0 + w_1x_1+ … + w_{n-1}x_{n-1} + b \tag{1}f w ,b =w 0 x 0 +w 1 x 1 +…+w n −1 x n −1 +b (1 )

先看看按照以前的线性回归方法的效果


x = np.arange(0, 20, 1)
y = 1 + x**2
X = x.reshape(-1, 1)

model_w,model_b = run_gradient_descent_feng(X,y,iterations=1000, alpha = 1e-2)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("no feature engineering")
plt.plot(x,X@model_w + model_b, label="Predicted Value");  plt.xlabel("X"); plt.ylabel("y"); plt.legend(); plt.show()

明显不行，我们需要多项式特征，因此我们进行特征工程，调整x的次数


x = np.arange(0, 20, 1)
y = 1 + x**2

X = x**2

X = X.reshape(-1, 1)
model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha = 1e-5)

Iteration         0, Cost: 7.32922e+03
Iteration      1000, Cost: 2.24844e-01
Iteration      2000, Cost: 2.22795e-01
Iteration         0, Cost: 7.32922e+03
Iteration      1000, Cost: 2.24844e-01
Iteration      2000, Cost: 2.22795e-01
Iteration      3000, Cost: 2.20764e-01
Iteration      4000, Cost: 2.18752e-01
Iteration      5000, Cost: 2.16758e-01
Iteration      3000, Cost: 2.20764e-01
Iteration      4000, Cost: 2.18752e-01
Iteration      5000, Cost: 2.16758e-01
Iteration      6000, Cost: 2.14782e-01
Iteration      7000, Cost: 2.12824e-01
Iteration      8000, Cost: 2.10884e-01
Iteration      6000, Cost: 2.14782e-01
Iteration      7000, Cost: 2.12824e-01
Iteration      8000, Cost: 2.10884e-01
Iteration      9000, Cost: 2.08962e-01
w,b found by gradient descent: w: [1.], b: 0.0490
Iteration      9000, Cost: 2.08962e-01
w,b found by gradient descent: w: [1.], b: 0.0490

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Added x**2 feature")
plt.plot(x, np.dot(X,model_w) + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

拟合出来的式子是y = 1 ∗ x 0 2 + 0.049 y=1*x_0^2+0.049 y =1 ∗x 0 2 +0.049

3.特征选择

Above, we knew that an x 2 x^2 x 2 term was required. It may not always be obvious which features are required. One could add a variety of potential features to try and find the most useful. For example, what if we had instead tried : y = w 0 x 0 + w 1 x 1 2 + w 2 x 2 3 + b y=w_0x_0 + w_1x_1^2 + w_2x_2^3+b y =w 0 x 0 +w 1 x 1 2 +w 2 x 2 3 +b ?
试一下别的，看拟合程度会不会更高


x = np.arange(0, 20, 1)
y = x**2

X = np.c_[x, x**2, x**3]

model_w,model_b = run_gradient_descent_feng(X, y, iterations=10000, alpha=1e-7)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("x, x**2, x**3 features")
plt.plot(x, X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

Iteration         0, Cost: 1.14029e+03
Iteration      1000, Cost: 3.28539e+02
Iteration      2000, Cost: 2.80443e+02
Iteration         0, Cost: 1.14029e+03
Iteration      1000, Cost: 3.28539e+02
Iteration      2000, Cost: 2.80443e+02
Iteration      3000, Cost: 2.39389e+02
Iteration      4000, Cost: 2.04344e+02
Iteration      5000, Cost: 1.74430e+02
Iteration      3000, Cost: 2.39389e+02
Iteration      4000, Cost: 2.04344e+02
Iteration      5000, Cost: 1.74430e+02
Iteration      6000, Cost: 1.48896e+02
Iteration      7000, Cost: 1.27100e+02
Iteration      8000, Cost: 1.08495e+02
Iteration      6000, Cost: 1.48896e+02
Iteration      7000, Cost: 1.27100e+02
Iteration      8000, Cost: 1.08495e+02
Iteration      9000, Cost: 9.26132e+01
w,b found by gradient descent: w: [0.08 0.54 0.03], b: 0.0106

拟合出来的式子：0.08 x + 0.54 x 2 + 0.03 x 3 + 0.0106 0.08x + 0.54x^2 + 0.03x^3 + 0.0106 0.08 x +0.54 x 2 +0.03 x 3 +0.0106

梯度下降通过强调其相关参数为我们选择”正确”的特征，较小的权重值意味着不太重要/正确的特征

4.多项式特征与线性特征的关联

我们进行多项式回归，也是在选择和y线性关联程度最高的特征


x = np.arange(0, 20, 1)
y = x**2

X = np.c_[x, x**2, x**3]
X_features = ['x','x^2','x^3']

fig,ax=plt.subplots(1, 3, figsize=(12, 3), sharey=True)
for i in range(len(ax)):
    ax[i].scatter(X[:,i],y)
    ax[i].set_xlabel(X_features[i])
ax[0].set_ylabel("y")
plt.show()

特征缩放 Scaling features


x = np.arange(0,20,1)
X = np.c_[x, x**2, x**3]
print(f"Peak to Peak range by column in Raw        X:{np.ptp(X,axis=0)}")

X = zscore_normalize_features(X)
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X,axis=0)}")

Peak to Peak range by column in Raw        X:[  19  361 6859]
Peak to Peak range by column in Normalized X:[3.3  3.18 3.28]
Peak to Peak range by column in Raw        X:[  19  361 6859]
Peak to Peak range by column in Normalized X:[3.3  3.18 3.28]

x = np.arange(0,20,1)
y = x**2

X = np.c_[x, x**2, x**3]
X = zscore_normalize_features(X)

model_w, model_b = run_gradient_descent_feng(X, y, iterations=100000, alpha=1e-1)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

Iteration         0, Cost: 9.42147e+03
Iteration         0, Cost: 9.42147e+03
Iteration     10000, Cost: 3.90938e-01
Iteration     10000, Cost: 3.90938e-01
Iteration     20000, Cost: 2.78389e-02
Iteration     20000, Cost: 2.78389e-02
Iteration     30000, Cost: 1.98242e-03
Iteration     30000, Cost: 1.98242e-03
Iteration     40000, Cost: 1.41169e-04
Iteration     40000, Cost: 1.41169e-04
Iteration     50000, Cost: 1.00527e-05
Iteration     50000, Cost: 1.00527e-05
Iteration     60000, Cost: 7.15855e-07
Iteration     60000, Cost: 7.15855e-07
Iteration     70000, Cost: 5.09763e-08
Iteration     70000, Cost: 5.09763e-08
Iteration     80000, Cost: 3.63004e-09
Iteration     80000, Cost: 3.63004e-09
Iteration     90000, Cost: 2.58497e-10
Iteration     90000, Cost: 2.58497e-10
w,b found by gradient descent: w: [5.27e-05 1.13e+02 8.43e-05], b: 123.5000
w,b found by gradient descent: w: [5.27e-05 1.13e+02 8.43e-05], b: 123.5000

6.复杂函数的拟合

x = np.arange(0,20,1)
y = np.cos(x/2)

X = np.c_[x, x**2, x**3,x**4, x**5, x**6, x**7, x**8, x**9, x**10, x**11, x**12, x**13]
X = zscore_normalize_features(X)

model_w,model_b = run_gradient_descent_feng(X, y, iterations=1000000, alpha = 1e-1)

plt.scatter(x, y, marker='x', c='r', label="Actual Value"); plt.title("Normalized x x**2, x**3 feature")
plt.plot(x,X@model_w + model_b, label="Predicted Value"); plt.xlabel("x"); plt.ylabel("y"); plt.legend(); plt.show()

Iteration         0, Cost: 2.24887e-01
Iteration         0, Cost: 2.24887e-01
Iteration    100000, Cost: 2.31061e-02
Iteration    100000, Cost: 2.31061e-02
Iteration    200000, Cost: 1.83619e-02
Iteration    200000, Cost: 1.83619e-02
Iteration    300000, Cost: 1.47950e-02
Iteration    300000, Cost: 1.47950e-02
Iteration    400000, Cost: 1.21114e-02
Iteration    400000, Cost: 1.21114e-02
Iteration    500000, Cost: 1.00914e-02
Iteration    500000, Cost: 1.00914e-02
Iteration    600000, Cost: 8.57025e-03
Iteration    600000, Cost: 8.57025e-03
Iteration    700000, Cost: 7.42385e-03
Iteration    700000, Cost: 7.42385e-03
Iteration    800000, Cost: 6.55908e-03
Iteration    800000, Cost: 6.55908e-03
Iteration    900000, Cost: 5.90594e-03
Iteration    900000, Cost: 5.90594e-03
w,b found by gradient descent: w: [-1.61e+00 -1.01e+01  3.00e+01 -6.92e-01 -2.37e+01 -1.51e+01  2.09e+01
 -2.29e-03 -4.69e-03  5.51e-02  1.07e-01 -2.53e-02  6.49e-02], b: -0.0073
w,b found by gradient descent: w: [-1.61e+00 -1.01e+01  3.00e+01 -6.92e-01 -2.37e+01 -1.51e+01  2.09e+01
 -2.29e-03 -4.69e-03  5.51e-02  1.07e-01 -2.53e-02  6.49e-02], b: -0.0073

7.课后题

特征缩放的时候要减去平均值
代价不能较快的减少证明学习率偏大，代价反而增大证明学习率太大
当特征之间的具体数值差距过大的时候，需要用到特征缩放

4. 数量乘价格才是销售总价

Original: https://blog.csdn.net/m0_51371693/article/details/126992846
Author: KiraFenvy
Title: 【Machine Learning】5.特征工程和多项式回归

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/630345/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

目标检测数据集标注文件统计并可视化–yolov5

坚持写博客💪，分享自己的在学习、工作中的所得给自己做备忘对知识点记录、总结，加深理解给有需要的人一些帮助，少踩一个坑，多走几步路尽量以合适的方式排版，图文兼有如果写的有误，…

人工智能 2023年5月26日
0082
深度盘点：Python 变量类型转换的 6 种方法

大家好，今天我来给大家介绍 Python 变量类型转换的 6 种方法。梳理不易，喜欢记得点赞、收藏、关注。【注】完整版代码、数据、技术交流，文末获取一、变量类型及转换对于变…

人工智能 2023年6月19日
0077
人脑部神经网络分布特点,人脑部神经网络分布图

人的大脑的怎么分配的大脑(Brain)包括左、右两个半球及连接两个半球的中间部分，即第三脑室前端的终板。大脑半球被覆灰质，称大脑皮质，其深方为白质，称为髓质。髓质内的灰质核团为基…

人工智能 2023年7月13日
0092
白噪声，有色噪声的定义、特性及其MATLAB仿真

一、白噪声白噪声（white noise）是指功率谱密度在整个频域内是常数的噪声。所有频率具有相同能量密度的随机噪声称为白噪声。白噪声是指在较宽的频率范围内，各等带宽的频带所含…

人工智能 2023年6月23日
00102
特征工程之数据预处理

目录 1 简介 2 非数值类型数据处理 2.1 Get_dummies哑变量处理 2.2 Label Encoding编号处理补充知识点：pandas库中的replace()函数…

人工智能 2023年6月11日
0069
基于MaixHub的小方舟分类模型学习

基于MaixHub的小方舟分类模型学习 * – 前言 – 一、材料准备 – 二、烧录固件 – + 1、固件下载 + 2、烧录固件 &…

人工智能 2023年7月2日
00102
pytorch导入自定义数据集

最近刚学图神经网络，数据集导入折腾了很久，终于开窍了一点。目前常用的数据导入方法主要有两种：（1）torchvision自带的导入方式:这种导入方式使用了torchvision自…

人工智能 2023年7月21日
0079
三种能有效融合文本和图像信息的方法——特征拼接、跨模态注意、条件批量归一化

当前T2I模型的一大限制就是如何有效地融合文本和图像信息？目前常用的有特征拼接（features concatenation）、跨模态注意（cross-modal attenti…

人工智能 2023年6月16日
0069
CLAHE算法

直方图均衡（HE）直方图均衡 histogram equalization是图像增强的一种方法。这个过程的变换函数为：s = T ( r ) = ( L − 1 ) ∫ 0 r …

人工智能 2023年6月18日
00102
神经网络学习笔记1——BP神经网络原理到编程实现（matlab，python）

目录先表达一下歉意吧下面是视频地址和代码数据 BP神经网络原理及编程实现_哔哩哔哩_bilibili 1.bp神经网络原理 1.1前向传播 1.2反向传播 1.3 测试模型 2…

人工智能 2023年7月14日
0058
Windows安装GPU版本的tensorflow+CUDA+CUDNN（超详细）

目的：安装GPU版本的tensorflow 一、查看电脑的NVIDIA 版本是否支持CUDA 以及能够配置的CUDA 版本方法：在桌面空白处单击右键，打开英伟达控制面板，如果找…

人工智能 2023年5月24日
00163
记录一些关于学习摄影的东西（从入门到放弃）

本篇博客主要是我记录学习摄影的一个过程。原因：和同事聊天，羡慕同事的各种游玩经历，买了个相机准备激励自己多出门走走。基础：以前从来没有涉及这方面的知识，一直以来都是拿手机瞎拍，…

人工智能 2023年6月22日
0083
Topic 6 SCI 文章之计数变量泊松回归

这期继续说说统计这些事，泊松分布大家可能熟悉些，但是用它来做模型还是需要细细品味一下。泊松回归，也被称为对数线性模型，当结果变量是一个计数(即数值型，但不像连续变量的范围那么大)…

人工智能 2023年6月17日
0069
Tensorflow项目实战-Cats And Dogs数据集

目录数据集: Baidu 网盘地址: 实验环境: 数据集处理部分: 网络搭建： VGG16(非原版的，采用的是B站北京大学教程里面的那个) InceptionNet_v1_10 …

人工智能 2023年5月23日
0090
Pandas DataFrame.astype()使用实例

astype()方法通常用于将Pandas对象转换为指定的dtype.astype()函数。它还可以将任何合适的现有列转换为分类类型。当我们想将特定的列数据类型转换为另一种数据类…

人工智能 2023年6月2日
0074
从智能对话系统导论，到如何设计第一个对话机器人

从智能对话系统导论，到如何设计第一个对话机器人一、智能对话系统导论 * 1、生活中的 Conversational AI 2、一种新的人机交互方式 3、一些关于 Conversa…

人工智能 2023年5月30日
0096

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

【Machine Learning】5.特征工程和多项式回归

特征工程和多项式回归

大家都在看