为什么反向传播算法通常与梯度下降优化算法一起使用

2024年1月5日下午5:42 • 人工智能 • 阅读 46

为什么反向传播算法通常与梯度下降优化算法一起使用？

反向传播算法（Backpropagation）与梯度下降优化算法（Gradient Descent）常常一起使用是因为反向传播算法可以计算神经网络中每个参数对损失函数的梯度，而梯度下降算法可以使用这些梯度信息来更新参数以最小化损失函数。本文将详细介绍反向传播算法、梯度下降优化算法以及它们的原理和流程，并给出一个复杂的Python代码示例。

反向传播算法原理

反向传播算法是一种计算梯度的有效方式，用于将损失函数的梯度从输出层传递到输入层。其基本原理是通过使用链式法则，将损失函数关于每个参数的偏导数连接起来。反向传播算法分为两个主要的步骤：前向传播（Forward Propagation）和反向传播（Backward Propagation）。

在前向传播中，从输入层开始，依次计算每一层的输出。对于每个神经元，计算出该神经元的输入加权和，并通过激活函数得到输出值。然后，将这些输出值传递到下一层，进行下一层的计算，直到到达输出层并得到最终的输出。

在反向传播中，它从输出层开始，计算每个参数对损失函数的梯度。首先计算输出层的误差，然后将误差沿着网络反向传播到前面的每一层。对于每个参数，通过将输出层的误差与上一层的输出相乘，得到该参数对损失函数的偏导数，即梯度。这些梯度可以用于梯度下降优化算法的更新步骤。

反向传播算法的公式推导

令 $C$ 代表损失函数，$w_{ji}^l$ 代表连接第$l$层的第$i$个神经元和$l+1$层的第$j$个神经元的权重，$b_j^l$ 代表$l+1$层的第$j$个神经元的偏置项。对于输出层 $L$ 中的每个神经元 $j$，误差 $\delta_j^L$ 可以通过下式计算得到：

$$
\delta_j^L = \frac{\partial C}{\partial a_j^L} \sigma'(z_j^L)
$$

其中，$a_j^L$ 是输出层 $L$ 中的第 $j$ 个神经元的激活值，$\sigma(z)$ 是激活函数的导数。

对于任何其他层 $l$ 中的神经元$j$，误差 $\delta_j^l$ 可以通过下式计算得到：

$$
\delta_j^l = \sum_{k} w_{kj}^{l+1} \delta_k^{l+1} \sigma'(z_j^l)
$$

其中，$w_{kj}^{l+1}$ 是连接第$l$层的第$j$个神经元和第$l+1$层的第$k$个神经元的权重。

反向传播算法的计算步骤

反向传播算法的具体计算步骤如下：

输入训练样本的特征值和目标值，初始化神经网络的权重和偏置项。
进行前向传播计算，计算每个神经元的输出值。
计算输出层的误差 $\delta_j^L$。
使用误差 $\delta_j^L$ 和公式 $\delta_j^l = \sum_{k} w_{kj}^{l+1} \delta_k^{l+1} \sigma'(z_j^l)$，计算所有其他层的误差 $\delta_j^l$。
使用梯度下降算法的步骤更新权重和偏置项：

$$
w_{ji}^l \leftarrow w_{ji}^l – \eta \frac{\partial C}{\partial w_{ji}^l} \
b_j^l \leftarrow b_j^l – \eta \frac{\partial C}{\partial b_j^l}
$$

其中，$\eta$ 是学习率。

重复步骤2-5，直到达到收敛条件或达到最大迭代次数。

反向传播算法示例代码

以下是一个使用反向传播算法求解多层感知器（Multi-Layer Perceptron）的Python示例代码：

import numpy as np

# 定义激活函数sigmoid
def sigmoid(x):
 return 1 / (1 + np.exp(-x))

# 定义激活函数sigmoid的导数
def sigmoid_derivative(x):
 return sigmoid(x) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - sigmoid(x))

# 定义损失函数MSE
def mse_loss(y_true, y_pred):
 return np.mean((y_true - y_pred)**2)

# 定义反向传播算法
def backpropagation(X, y_true, learning_rate, num_iterations):
 # 初始化权重和偏置项
 weights = [np.random.randn(X.shape[1], 4), np.random.randn(4, 1)]
 biases = [np.zeros((1, 4)), np.zeros((1, 1))]

 for i in range(num_iterations):
 # 前向传播
 hidden_layer_output = sigmoid(np.dot(X, weights[0]) + biases[0])
 y_pred = sigmoid(np.dot(hidden_layer_output, weights[1]) + biases[1])

 # 计算输出层的误差
 output_error = y_pred - y_true

 # 计算隐藏层的误差
 hidden_error = np.dot(output_error, weights[1].T) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(hidden_layer_output)

 # 计算权重和偏置项的梯度
 gradient_weights_1 = np.dot(hidden_layer_output.T, output_error)
 gradient_biases_1 = np.sum(output_error, axis=0, keepdims=True)
 gradient_weights_0 = np.dot(X.T, hidden_error)
 gradient_biases_0 = np.sum(hidden_error, axis=0, keepdims=True)

 # 使用梯度下降更新权重和偏置项
 weights[1] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls gradient_weights_1
 biases[1] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls gradient_biases_1
 weights[0] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls gradient_weights_0
 biases[0] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls gradient_biases_0

 # 计算损失函数并输出
 if i % 1000 == 0:
 loss = mse_loss(y_true, y_pred)
 print(f"Iteration {i}, Loss: {loss}")

 return weights, biases

# 创建虚拟数据集
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# 运行反向传播算法
weights, biases = backpropagation(X, y, learning_rate=0.1, num_iterations=10000)

以上代码实现了一个具有一个隐藏层的多层感知器。其中，sigmoid 函数和 sigmoid_derivative 函数分别表示激活函数sigmoid及其导数，mse_loss 函数表示均方误差损失函数。backpropagation 函数实现了完整的反向传播算法，根据给定的学习率和迭代次数，在给定的虚拟数据集上训练神经网络。在每次迭代之后，输出当前迭代次数和损失函数的值。

代码解释

在前向传播中，我们计算了隐藏层的输出 hidden_layer_output 和最终输出 y_pred。
在反向传播中，我们计算了输出层的误差 output_error 和隐藏层的误差 hidden_error。
我们根据这些误差计算了权重和偏置项的梯度 gradient_weights_1，gradient_biases_1，gradient_weights_0，gradient_biases_0。
使用梯度下降算法的步骤，我们更新了权重和偏置项 weights，biases。
每隔1000次迭代，我们计算了当前的损失函数，并打印出来。

这个示例代码展示了反向传播算法的一般流程，实际应用中可能会有一些改动，比如添加正则化项或使用动量法等。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/824077/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

SpectralNet: Spectral Clustering Using Deep Neural Networks

论文：ICLR 2018代码：tensorflow 1. 文章中的训练好的SpectralNet可以实现将输入点到输出图拉普拉斯矩阵的本征空间的映射，并且实现聚类。训练Spectr…

人工智能 2023年6月2日
0081
web大作业 web前端课程设计 web前端课程设计代码 web课程设计 HTML网页制作代码

啊哦~你想找的内容离你而去了哦内容不存在，可能为如下原因导致： ① 内容还在审核中 ② 内容以前存在，但是由于不符合新的规定而被删除 ③ 内容地址错误 ④ 作者删除了内容。可…

人工智能 2023年6月27日
0084
pandas使用

https://www.runoob.com/pandas/pandas-tutorial.html 一、什么是pandas，能做什么 Pandas 是 Python 语言的一个扩…

人工智能 2023年7月7日
0065
Pytorch：tensor.mean()和tensor.sum()

一、tensor.mean()定义： input = torch.randn(4, 4) torch.mean(a, 0) 等同于 input.mean(0) 方法参考：torc…

人工智能 2023年7月21日
0055
【TensorRT】实际测试中有效对应的cuda版本和cudnn版本

官网下载： https://developer.nvidia.com/nvidia-tensorrt-8x-download cuDNN Archive | NVIDIA Deve…

人工智能 2023年7月22日
0067
pandas核心数据结构Series/DataFrame/Panel

Series Series是一维带标签的数组，数组里可以放任意的数据（整数、浮点数、字符串、python对象）。创建函数 s = pd.Series(data, index = i…

人工智能 2023年7月7日
0073
Gazebo-Ros搭建小车和场景并运行slam算法进行建图4–为机器人添加运动控制器控制其移动

Gazebo-Ros搭建小车和场景并运行slam算法进行建图4–为机器人添加运动控制器控制其移动 1.要想机器人小车在gazebo中运动还需要为其添加运动插件在文章3中的my_r…

人工智能 2023年6月10日
00108
【深度学习】常见的神经网络层（上）

🔝🔝🔝🔝🔝🔝🔝🔝🔝🔝🔝🔝🥰 博客首页：knighthood2001😗 欢迎点赞👍评论🗨️❤️ 热爱python，期待与大家一同进步成长！！❤️👀 给大家推荐一款很火爆的刷题、面试求…

人工智能 2023年6月12日
0082
熊孩子说“你没看过奥特曼”，赶紧用Python学习一下，没想到

好的，以下是一个简单的 _Python_表白游戏示例： _python_ import _py_game, random # 初始化 _Py_game _py_game.init(…

人工智能 2023年7月15日
0088
MySQL存储引擎详解(一)-InnoDB架构

目录前言一、支持的存储引擎二、InnoDB引擎 1.Buffer Pool 传统LUR算法预读预读失效 2.Log Buffer 3.Adaptive Hash Inde…

人工智能 2023年7月29日
0067
论文阅读|HigherHRNet

HigherHRNet: Scale-Aware Representation Learning forBottom-Up Human Pose Estimation 参考资料：(…

人工智能 2023年7月10日
00127
HOG3D原理解读

最近需要使用HOG3D对视频中提取的兴趣点进行描述，就研究了下相关的原理，原论文名为：A Spatio-Temporal Descriptor Based on 3D-Gradie…

人工智能 2023年7月10日
0074
YOLOv4: Optimal Speed and Accuracy of Object Detection

YOLOv4: Optimal Speed and Accuracy of Object Detection英文注解： YOLOv4:OptimalSpeedandAccuracy…

人工智能 2023年7月9日
0057
SegFormer论文记录（详细翻译）

SegFormer 论文记录代码：GitHub – NVlabs/SegFormer: Official PyTorch implementation of SegF…

人工智能 2023年6月16日
00109
torchvision详细介绍

深度学习道路漫漫，唯有不断总结，脚踏实地才能造就一番就成，也不断勉励自己，不要放弃，相信自己可以的！！！共勉！！！ torchvision是 pytorch的一个图形库，它服务于 …

人工智能 2023年7月20日
0090
机器学习中的数学——常用概率分布（十）：贝塔分布（Beta分布）

分类目录：《机器学习中的数学》总目录相关文章：· 常用概率分布（一）：伯努利分布（Bernoulli分布）· 常用概率分布（二）：范畴分布（Multinoulli分布）· 常用概率…

人工智能 2023年6月16日
00135

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31