误差反向传播算法是否存在过拟合问题，如何解决

2024年1月4日下午8:43 • 人工智能 • 阅读 39

问题描述：误差反向传播算法是否存在过拟合问题，如何解决这个问题？

介绍

误差反向传播算法是用于训练神经网络的一种重要算法。在训练过程中，我们通过计算网络输出与真实值的差异（即误差），并将误差反向传播到网络的各个层，从而更新网络的参数。然而，这种算法存在一个常见问题，即过拟合。过拟合指的是模型在训练数据上表现良好，但是在未见过的测试数据上表现较差的现象。

算法原理

误差反向传播算法主要通过链式法则计算梯度以更新模型参数。在每一次迭代训练中，通过计算模型对训练样本的预测值与真实值之间的误差（损失函数），可以得到一个代表误差大小的值。然后，通过计算损失函数对网络参数的偏导数，即梯度，我们可以通过优化算法（如梯度下降法）来更新网络参数，以尽可能减少误差。

公式推导

让我们以一个简单的例子来推导误差反向传播算法的公式。假设我们有一个单隐藏层的前馈神经网络，输入为x，隐藏层神经元个数为h，输出层神经元个数为o。网络的输入层到隐藏层的权重为W1，隐藏层到输出层的权重为W2，隐藏层的偏置为b1，输出层的偏置为b2。我们使用平方损失函数，即$L(y, \hat{y}) = \frac{1}{2}(y – \hat{y})^2$。其中，y为真实值，$\hat{y}$为网络的预测值。

Step 1: 前向传播
首先，我们根据输入x计算隐藏层的激活值$a_1$：
$$a_1 = \sigma(W_1 \cdot x + b_1)$$
其中，$\sigma$为激活函数，通常使用ReLU或Sigmoid函数。

然后，我们使用隐藏层的激活值计算输出层的激活值$a_2$：
$$a_2 = \sigma(W_2 \cdot a_1 + b_2)$$

Step 2: 计算损失函数
接下来，我们计算损失函数对输出层激活值的偏导数$\frac{\partial L}{\partial a_2}$：
$$\frac{\partial L}{\partial a_2} = \frac{\partial}{\partial a_2} \frac{1}{2}(y – a_2)^2 = (a_2 – y)$$

Step 3: 反向传播
现在，我们使用链式法则计算损失函数对隐藏层到输出层权重W2的偏导数$\frac{\partial L}{\partial W_2}$：
$$\frac{\partial L}{\partial W_2} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial (W_2 \cdot a_1 + b_2)} = (a_2 – y) \cdot \sigma'(W_2 \cdot a_1 + b_2) \cdot a_1^T$$
其中，$\sigma’$为激活函数的导数。

接着，我们计算损失函数对隐藏层偏置b2的偏导数$\frac{\partial L}{\partial b_2}$：
$$\frac{\partial L}{\partial b_2} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial (W_2 \cdot a_1 + b_2)} = (a_2 – y) \cdot \sigma'(W_2 \cdot a_1 + b_2)$$

类似地，我们计算损失函数对输入层到隐藏层权重W1的偏导数$\frac{\partial L}{\partial W_1}$：
$$\frac{\partial L}{\partial W_1} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial a_1} \cdot \frac{\partial a_1}{\partial (W_1 \cdot x + b_1)} \cdot x^T$$

最后，我们计算损失函数对隐藏层偏置b1的偏导数$\frac{\partial L}{\partial b_1}$：
$$\frac{\partial L}{\partial b_1} = \frac{\partial L}{\partial a_2} \cdot \frac{\partial a_2}{\partial a_1} \cdot \frac{\partial a_1}{\partial (W_1 \cdot x + b_1)}$$

Step 4: 参数更新
通过梯度下降法等优化算法，我们可以更新模型参数。假设学习率为$\alpha$，参数更新的具体公式如下：
$$W_2 = W_2 – \alpha \cdot \frac{\partial L}{\partial W_2}$$
$$b_2 = b_2 – \alpha \cdot \frac{\partial L}{\partial b_2}$$
$$W_1 = W_1 – \alpha \cdot \frac{\partial L}{\partial W_1}$$
$$b_1 = b_1 – \alpha \cdot \frac{\partial L}{\partial b_1}$$

计算步骤

初始化输入层到隐藏层的权重W1、隐藏层到输出层的权重W2、隐藏层的偏置b1、输出层的偏置b2。
根据输入x，进行前向传播，计算网络的预测值。
根据网络的预测值和真实值，计算损失函数。
根据损失函数，进行反向传播，计算梯度。
使用优化算法，根据梯度更新模型参数。
重复步骤2-5，直至达到收敛条件或达到最大迭代次数。

Python代码示例

下面是一个使用Python实现的简单的误差反向传播算法的例子。我们使用一个虚拟数据集，并使用Numpy库来进行矩阵运算。

首先，我们需要导入必要的库：

import numpy as np
import matplotlib.pyplot as plt

然后，我们定义激活函数和其导数：

def sigmoid(x):
 return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
 return sigmoid(x) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - sigmoid(x))

接下来，我们初始化输入、隐藏层、输出层的维度和学习率等参数：

input_dim = 1
hidden_dim = 3
output_dim = 1
learning_rate = 0.01

然后，我们生成虚拟数据集：

X = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
y = np.array([[0], [1], [4], [9], [16], [25], [36], [49], [64], [81]])

接着，我们随机初始化参数权重W1、W2以及偏置b1、b2：

np.random.seed(1)
W1 = np.random.randn(hidden_dim, input_dim)
b1 = np.zeros((hidden_dim, 1))
W2 = np.random.randn(output_dim, hidden_dim)
b2 = np.zeros((output_dim, 1))

然后，我们开始迭代训练模型：

losses = []
for i in range(10000):
 # 前向传播
 hidden_layer_output = sigmoid(np.dot(W1, X) + b1)
 y_pred = sigmoid(np.dot(W2, hidden_layer_output) + b2)

 # 计算损失函数
 loss = 0.5 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.mean((y_pred - y) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 2)
 losses.append(loss)

 # 反向传播
 dL_dypred = y_pred - y
 dypred_dz2 = sigmoid_derivative(np.dot(W2, hidden_layer_output) + b2)
 dz2_dW2 = hidden_layer_output
 dL_dW2 = np.dot((dL_dypred artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dypred_dz2), dz2_dW2.T)
 dL_db2 = np.sum(dL_dypred artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dypred_dz2, axis=1, keepdims=True)

 dz2_dhidden_layer = W2
 dhidden_layer_dz1 = sigmoid_derivative(np.dot(W1, X) + b1)
 dz1_dW1 = X
 dL_dW1 = np.dot((np.dot(dz2_dhidden_layer.T, (dL_dypred artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dypred_dz2)) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dhidden_layer_dz1), dz1_dW1.T)
 dL_db1 = np.sum(np.dot(dz2_dhidden_layer.T, (dL_dypred artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dypred_dz2)) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dhidden_layer_dz1, axis=1, keepdims=True)

 # 更新参数
 W2 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dL_dW2
 b2 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dL_db2
 W1 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dL_dW1
 b1 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dL_db1

最后，我们绘制损失函数的变化曲线来观察模型的训练情况：

plt.plot(losses)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.show()

代码细节解释

计算隐藏层和输出层的激活值时，使用了sigmoid激活函数。
计算损失函数时，使用了平方损失函数。
实现了参数的随机初始化，以防止所有参数具有相同的初始值，陷入局部最优点。
反向传播过程中，需要使用激活函数的导数来计算偏导数，这里使用了sigmoid函数的导数。
在更新参数时，需要乘以学习率。
每次迭代都记录下损失函数的值，便于之后绘制损失函数曲线进行分析。

通过运行以上代码，我们可以观察到损失函数逐渐减小，模型通过误差反向传播算法不断进行参数更新，逐渐优化模型。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/823879/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

图像处理/人工智能/opencv之深入理解模板匹配算法

目录 * – 1、什么是模板匹配 – 2、模板匹配算法 – 3、介绍opencv相关api 1、什么是模板匹配简单来说模板匹配就是通过现有的模…

人工智能 2023年6月18日
00103
读懂婴幼儿语音和图像数据，打造AI宝宝看护管家

近年来，婴幼儿产业向深水区发展，增资使婴幼儿市场越来越规范、规模化，发展路线更加清晰明确。 [En] In recent years, the infant industry ha…

人工智能 2023年5月27日
00144
torch_scatter安装踩坑实录及解决方法（ubuntu/conda/python/torch/cuda/torch_cluster）

【简洁版解决方案直接跳到最后】一。安装途径一般在ubuntu中配置python环境，选择conda安装是不错的选择。但是使用anaconda网站上torch_scatter的安…

人工智能 2023年7月21日
0057
opencv c++ 高斯模糊，高斯双边模糊（28）

高斯模糊高斯核从最中心到边缘的权值逐渐变小，服从如下正态分布，从而使得中心像素占的比例最大。 API： sigmaXGaussian kernel standard deviat…

人工智能 2023年7月19日
0058
OpenCV（项目）人脸识别（图片识别、摄像头识别）

目录一、基础理论 1、基于特征的算法 2、基于图像的算法 3、Haar特征 4、adaboost级联决策器二、人脸识别（图片） 1、图片灰度化 2、训练一组数据 3、检测人脸 …

人工智能 2023年6月24日
0074
【Python基础篇002】：超详细的格式化输出（format的基本玩法）

目录前言： 🚀🚀一、什么是format 一、format的基本玩法 🚀🚀format玩法一：按顺序输出（按照{}的顺序依次匹配括号中的值） 🚀🚀format玩法二：按索引输出 🚀…

人工智能 2023年7月4日
00114
图像识别之Yolov5模型配置以及使用

图像识别之Yolov5模型配置以及使用文章目录 * – 图像识别之Yolov5模型配置以及使用 – + 一、文章简介 + 二、下载Yolov5 + * 1…

人工智能 2023年5月28日
00201
jar注册为win服务

方法一：使用WinSW工具工具地址：https://link.jianshu.com/?t=https://github.com/kohsuke/winsw/releases 参…

人工智能 2023年6月4日
0078
一、DataFrame of pandas & 插入或删除行、列

插入或删除行、列这里举例df为原dataframe,axis为0表示删除行，axis为1表示删除列 1、删除一行 df.drop(0,axis=0,inplace=False) …

人工智能 2023年7月6日
0051
目标检测算法简单说明（自己正在学习）

目标检测是作为计算机视觉领域的核心任务之一，其主要任务就是对图像或者图像序列的物体进行分类和定位。传统目标检测有很多弊端，比如泛化性能差，需要大量的人工去提取特征等缺点，并且由于…

人工智能 2023年7月12日
0082
YOLO Air一款面向科研小白的YOLO项目 | 包含大量改进方式教程

YOLO Air一款面向科研小白的YOLO项目|包含大量改进方式教程|适用YOLOv5,YOLOv7,YOLOX,YOLOv4,YOLOR,YOLOv3,transformer等算…

人工智能 2023年7月26日
0059
Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection（速读啊）内含与u-shape的对比

今天早早起来了吃完饭就开始干活了十点开始读论文所以速读适合没有很长事假你的情况下，你只需要读懂大意就可以了 QAQ，bhys，以后一定精读，好好找找里面的专业名词整理下来呜呜…

人工智能 2023年7月10日
0076
一句话生成视频(python)

一句话生成视频(python) 最近想运营个抖音号, 但是制作视频是个很麻烦的事情. 因此，我想到了运营一个著名的警句和新闻抖音账号。 [En] So I thought of r…

人工智能 2023年5月23日
0094
[ 注意力机制 ] 经典网络模型2——CBAM 详解与复现

🤵 Author ：Horizon Max ✨ 编程技巧篇：各种操作小结 🎇 机器视觉篇：会变魔术 OpenCV 💥 深度学习篇：简单入门 PyTorch 🏆 神经网络篇：经典网络…

人工智能 2023年6月16日
00172
CycleGAN的介绍与实现效果

本篇文章将会介绍cyclegan的基本原理以及实现的效果。 1.实现效果首先介绍一下cyclegan的实现的效果，简单来说就是将不同域之间的图像进行转换，而本身的形状保持不变如下…

人工智能 2023年6月16日
0096
【OpenCV 例程200篇】203. 伪彩色图像处理

OpenCV 例程200篇总目录201. 图像的颜色空间转换202. 查表快速替换（cv.LUT）203. 伪彩色图像处理204. 图像的色彩风格滤镜205. 调节色彩平衡/饱和…

人工智能 2023年6月17日
00103

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31