什么是深度学习算法中的梯度消失问题

2024年1月1日上午4:24 • 人工智能 • 阅读 21

1. 什么是深度学习算法中的梯度消失问题？

在深度学习中，梯度消失问题是指在反向传播过程中，当深度神经网络的层数较多时，后面的层次很难获取到前面层次的有效梯度，使得模型的训练变得困难甚至无法收敛。这导致深度神经网络在训练过程中表现不佳，无法很好地学习到高层次的特征。

2. 深度学习算法原理

深度学习算法通过神经网络模拟人脑的工作方式，通过多个层次的神经元来处理复杂的输入数据。在进行训练时，使用反向传播算法来更新神经网络的权重值，使得网络能够逐步学习并提取出输入数据的有用特征。

3. 梯度消失问题的原理

梯度消失问题在神经网络的反向传播过程中产生。在反向传播中，梯度值代表了网络参数对误差的敏感程度，用于更新权重值。在深度神经网络中，梯度通过链式法则一层一层地传播回输入层，而且每一层都需要计算激活函数的导数。

在梯度传播的过程中，如果激活函数选择了sigmoid函数，当梯度从输出层向输入层传播时，由于sigmoid函数的导数在0附近取值较小，导致梯度值逐渐减小，甚至趋于0。这样，靠近输入层的权重更新速度非常缓慢，可能不会发生实质性的变化，导致模型无法很好地学习到高层次的特征。

4. 梯度消失问题的数学表达与计算步骤

数学表达：

考虑一个具有L层的神经网络，其中第l层的激活函数为sigmoid函数，第l层的输出为a^l，第l层的输入为z^l，则梯度传播过程中的梯度值为：$$\delta ^ l = \delta^{l+1} \cdot \sigma'(z^l) \cdot w^{l+1}$$，其中，$$\sigma’$$表示sigmoid函数的导数。

计算步骤：

初始化神经网络的权重值和偏置项。
前向传播计算预测值，计算损失函数的值。
使用反向传播计算输出层的梯度值。
逐层向前计算每层的梯度值。

5. 梯度消失问题的Python代码示例

下面以一个包含多层的神经网络为例，使用Python代码演示梯度消失问题。

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
 return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
 return sigmoid(x) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - sigmoid(x))

def initialize_weights(layers):
 weights = {}
 for i in range(1, len(layers)):
 weights['W' + str(i)] = np.random.randn(layers[i], layers[i-1]) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.01
 weights['b' + str(i)] = np.zeros((layers[i], 1))
 return weights

def forward_propagation(X, weights):
 cache = {'A0':X}
 L = len(weights) // 2
 for i in range(1, L + 1):
 Z = np.dot(weights['W' + str(i)], cache['A' + str(i-1)]) + weights['b' + str(i)]
 cache['A' + str(i)] = sigmoid(Z)
 return cache['A' + str(L)], cache

def compute_cost(AL, Y):
 m = Y.shape[1]
 cost = -1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(Y artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(AL) + (1 - Y) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(1 - AL))
 return cost

def backward_propagation(AL, Y, cache, weights):
 grads = {}
 L = len(cache) // 2
 dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
 dZ = dAL artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(L)])
 grads['dW' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(L-1)].T)
 grads['db' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
 for i in range(L-1, 0, -1):
 dA_prev = np.dot(weights['W' + str(i+1)].T, dZ)
 dZ = dA_prev artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(i)])
 grads['dW' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(i-1)].T)
 grads['db' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
 return grads

def update_weights(weights, grads, learning_rate):
 L = len(weights) // 2
 for i in range(L):
 weights['W' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['dW' + str(i+1)]
 weights['b' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['db' + str(i+1)]
 return weights

def train(X, Y, layers, learning_rate, num_iterations):
 costs = []
 weights = initialize_weights(layers)
 for i in range(num_iterations):
 AL, cache = forward_propagation(X, weights)
 cost = compute_cost(AL, Y)
 grads = backward_propagation(AL, Y, cache, weights)
 weights = update_weights(weights, grads, learning_rate)
 if i % 100 == 0:
 costs.append(cost)
 print("Cost after iteration {}: {:.4f}".format(i, cost))
 plt.plot(costs)
 plt.xlabel("Iterations (per 100s)")
 plt.ylabel("Cost")
 plt.title("Learning rate =" + str(learning_rate))
 plt.show()
 return weights

# 生成虚拟数据集
np.random.seed(0)
m = 1000
N = int(m / 2)
D = 2
X = np.zeros((m, D))
Y = np.zeros((m, 1))
a = 4
for j in range(2):
 ix = range(N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls j, N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (j + 1))
 t = np.linspace(j artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, (j + 1) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, N) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.2
 r = a artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(4 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls t) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.1
 X[ix] = np.c_[r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(t), r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.cos(t)]
 Y[ix] = j

layers = [2, 10, 10, 10, 1] # 设置神经网络的层数和每层神经元的个数
learning_rate = 0.01
num_iterations = 1000

weights = train(X.T, Y.T, layers, learning_rate, num_iterations)

这段代码演示了一个4层神经网络的训练过程，在每次迭代后计算成本并绘制成本随迭代次数的变化曲线。在梯度消失问题中，我们会观察到成本几乎没有变化，神经网络无法正确学习输入数据的特征。

6. 代码细节解释

sigmoid函数和sigmoid_derivative函数：用于计算sigmoid函数和其导数值。
initialize_weights函数：用于初始化神经网络的权重矩阵和偏置项。
forward_propagation函数：用于进行前向传播计算，得到神经网络的输出值。
compute_cost函数：用于计算成本函数的值。
backward_propagation函数：用于进行反向传播计算，得到每层的梯度值。
update_weights函数：用于更新神经网络的权重值。
train函数：用于训练神经网络，并绘制成本函数随迭代次数的曲线。

以上代码通过调用这些函数，实现了一个简单的神经网络的训练过程，并演示了梯度消失问题的现象。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/822350/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

AISHELL-4 多通道中文会议语音数据库

AISHELL-4是一个通过麦克风阵列实录的八通道中文普通话会议场景语音数据集。该数据集共包含 211场会议，每场会议4至8人，数据集共 120小时左右。该数据集旨在促进实际应用场…

人工智能 2023年5月23日
0092
机器学习中精准度、召回率、准确率、F1 Score、G分数计算学习笔记

目录 1、基本符号表示 2、类别下的精准度计算（precision） 3、类别下的召回率计算（recall） 4、准确率的计算（accuracy） 5、F1 Score 6、G分数…

人工智能 2023年5月28日
00105
深度学习：可视化方法（模型可视化，训练过程可视化，特征提取可视化）

0.环境说明 python3.8.5+pytorch 模型结构可视化 1.1 netron step1：在虚拟环境中安装netron pip install netron step…

人工智能 2023年7月27日
0062
【人工智能大作业】A*和IDA*搜索算法解决十五数码（15-puzzle）问题（Python实现）（启发式搜索）

Astar和IDAstar搜索算法解决十五数码（15-puzzle）问题 (文末附实现代码，此前为理论与具体问题分析) 文章目录 Astar和IDAstar搜索算法解决十五数码（1…

人工智能 2023年6月19日
00120
是否有自动调整超参数的方法或工具

问题背景介绍超参数是机器学习算法中的一些参数，它们用于控制算法的行为，而不是通过学习数据来确定。调整超参数是一个重要的任务，因为不同的参数设置可能导致性能的显著差异。传统上，超参…

人工智能 2024年1月6日
0061
记录历经三天将自己的yolov5模型部署到Android安卓手机

将yolov5部署到安卓手机移动端记录历经三天小白将自己的yolov5模型部署到安卓手机一、前言二、具体流程 * （一）.部署官方yolo到安卓 – 1.CMak…

人工智能 2023年5月30日
0084
生成对抗网络，从DCGAN到StyleGAN、pixel2pixel，人脸生成和图像翻译。

文章目录 * – 一、day1：生成对抗网络介绍 – + 1.1 生成对抗网络概述 + * 1.1.1 GAN的应用 * 1.1.2 GAN发展历史 + 1…

人工智能 2023年6月15日
0045
Jetson AGX Orin上部署YOLOv5_v5.0+TensorRT8

一．首先是捋请思路 ①刷机后的Orin上环境是：CUDA11.4+CUDNN8.3.2使得后续需要的部署环境只能为TensorRT8.x（这是根据cuda和cudnn的版本确定的，…

人工智能 2023年7月21日
0060
Adult数据集分析及四种模型实现

文章目录一、数据集 * 数据集介绍数据集预处理及分析二、四种模型对上述数据集进行预测 * 深度学习决策树支持向量机随机森林三、结果分析一、数据集数据集下载：htt…

人工智能 2023年6月15日
0075
引航计划Day10

时间来到了引航计划Day10。今天的主要任务是智慧城市与三维重建概论以及曙光环境运行RMVSNet文件。目录智慧城市与三维重建概论 COLMAP软件使用曙光环境运行RMVSN…

人工智能 2023年7月19日
0083
pytorch-实现mnist手写数字识别（彩色）

🍨 本文为🔗365天深度学习训练营中的学习记录博客 🍦 参考文章：365天深度学习训练营-第P2周：彩色识别 )**** *🍖 原作者：K同学啊|接辅导、项目定制我的环境语言…

人工智能 2023年7月23日
0065
你会进行PaddleOCR的环境配置吗？

文章目录前言一、PaddleOCR是什么？二、使用步骤（常用的） 1.windows下安装环境准备选择cpu情况：安装步骤：确认是不是x86_64处理器…

人工智能 2023年7月12日
0048
关于TensorFlow和PyTorch共同安装的兼容版本尝试的记录 – env_name: tftorch

所用命令简述 安装 TensorFlow 和 Pytorch conda create –name tftorc…

人工智能 2023年5月23日
0099
42、使用mmrotate中k3det进行旋转目标检测，并进行mnn部署和ncnn部署

基本思想：仍然是身份证分割，因为上一个篇博客的效果不好，所以操刀改mm系列的框架，并进行ncnn和mnn的c++的部署开发 mmcv_full 1.6.1＋mmrotate v0….

人工智能 2023年7月9日
00324
Global Tracking Transformers (多目标跟踪2022CVPR)

Global Tracking Transformers 论文地址：https://arxiv.org/abs/2203.13250代码： https://github.com/x…

人工智能 2023年7月10日
0093
tensorflow详细安装教程（Win10, Anaconda，Python3.9）

安装默认版本的tensorflow-cpu或者tensorflow-gpu。没有配置cuda，安装tensorflow-cpu版本的，可以输入命令：pip install &#8…

人工智能 2023年5月26日
0056

2024 年 4 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30