什么是深度学习算法中的梯度消失问题

1. 什么是深度学习算法中的梯度消失问题?

在深度学习中,梯度消失问题是指在反向传播过程中,当深度神经网络的层数较多时,后面的层次很难获取到前面层次的有效梯度,使得模型的训练变得困难甚至无法收敛。这导致深度神经网络在训练过程中表现不佳,无法很好地学习到高层次的特征。

2. 深度学习算法原理

深度学习算法通过神经网络模拟人脑的工作方式,通过多个层次的神经元来处理复杂的输入数据。在进行训练时,使用反向传播算法来更新神经网络的权重值,使得网络能够逐步学习并提取出输入数据的有用特征。

3. 梯度消失问题的原理

梯度消失问题在神经网络的反向传播过程中产生。在反向传播中,梯度值代表了网络参数对误差的敏感程度,用于更新权重值。在深度神经网络中,梯度通过链式法则一层一层地传播回输入层,而且每一层都需要计算激活函数的导数。

在梯度传播的过程中,如果激活函数选择了sigmoid函数,当梯度从输出层向输入层传播时,由于sigmoid函数的导数在0附近取值较小,导致梯度值逐渐减小,甚至趋于0。这样,靠近输入层的权重更新速度非常缓慢,可能不会发生实质性的变化,导致模型无法很好地学习到高层次的特征。

4. 梯度消失问题的数学表达与计算步骤

数学表达:

考虑一个具有L层的神经网络,其中第l层的激活函数为sigmoid函数,第l层的输出为a^l,第l层的输入为z^l,则梯度传播过程中的梯度值为:$$\delta ^ l = \delta^{l+1} \cdot \sigma'(z^l) \cdot w^{l+1}$$,其中,$$\sigma’$$表示sigmoid函数的导数。

计算步骤:

  1. 初始化神经网络的权重值和偏置项。
  2. 前向传播计算预测值,计算损失函数的值。
  3. 使用反向传播计算输出层的梯度值。
  4. 逐层向前计算每层的梯度值。

5. 梯度消失问题的Python代码示例

下面以一个包含多层的神经网络为例,使用Python代码演示梯度消失问题。

import numpy as np
import matplotlib.pyplot as plt

def sigmoid(x):
 return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
 return sigmoid(x) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - sigmoid(x))

def initialize_weights(layers):
 weights = {}
 for i in range(1, len(layers)):
 weights['W' + str(i)] = np.random.randn(layers[i], layers[i-1]) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.01
 weights['b' + str(i)] = np.zeros((layers[i], 1))
 return weights

def forward_propagation(X, weights):
 cache = {'A0':X}
 L = len(weights) // 2
 for i in range(1, L + 1):
 Z = np.dot(weights['W' + str(i)], cache['A' + str(i-1)]) + weights['b' + str(i)]
 cache['A' + str(i)] = sigmoid(Z)
 return cache['A' + str(L)], cache

def compute_cost(AL, Y):
 m = Y.shape[1]
 cost = -1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(Y artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(AL) + (1 - Y) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(1 - AL))
 return cost

def backward_propagation(AL, Y, cache, weights):
 grads = {}
 L = len(cache) // 2
 dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
 dZ = dAL artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(L)])
 grads['dW' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(L-1)].T)
 grads['db' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
 for i in range(L-1, 0, -1):
 dA_prev = np.dot(weights['W' + str(i+1)].T, dZ)
 dZ = dA_prev artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(i)])
 grads['dW' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(i-1)].T)
 grads['db' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
 return grads

def update_weights(weights, grads, learning_rate):
 L = len(weights) // 2
 for i in range(L):
 weights['W' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['dW' + str(i+1)]
 weights['b' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['db' + str(i+1)]
 return weights

def train(X, Y, layers, learning_rate, num_iterations):
 costs = []
 weights = initialize_weights(layers)
 for i in range(num_iterations):
 AL, cache = forward_propagation(X, weights)
 cost = compute_cost(AL, Y)
 grads = backward_propagation(AL, Y, cache, weights)
 weights = update_weights(weights, grads, learning_rate)
 if i % 100 == 0:
 costs.append(cost)
 print("Cost after iteration {}: {:.4f}".format(i, cost))
 plt.plot(costs)
 plt.xlabel("Iterations (per 100s)")
 plt.ylabel("Cost")
 plt.title("Learning rate =" + str(learning_rate))
 plt.show()
 return weights

# 生成虚拟数据集
np.random.seed(0)
m = 1000
N = int(m / 2)
D = 2
X = np.zeros((m, D))
Y = np.zeros((m, 1))
a = 4
for j in range(2):
 ix = range(N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls j, N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (j + 1))
 t = np.linspace(j artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, (j + 1) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, N) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.2
 r = a artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(4 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls t) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.1
 X[ix] = np.c_[r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(t), r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.cos(t)]
 Y[ix] = j

layers = [2, 10, 10, 10, 1] # 设置神经网络的层数和每层神经元的个数
learning_rate = 0.01
num_iterations = 1000

weights = train(X.T, Y.T, layers, learning_rate, num_iterations)

这段代码演示了一个4层神经网络的训练过程,在每次迭代后计算成本并绘制成本随迭代次数的变化曲线。在梯度消失问题中,我们会观察到成本几乎没有变化,神经网络无法正确学习输入数据的特征。

6. 代码细节解释

  • sigmoid函数和sigmoid_derivative函数:用于计算sigmoid函数和其导数值。

  • initialize_weights函数:用于初始化神经网络的权重矩阵和偏置项。

  • forward_propagation函数:用于进行前向传播计算,得到神经网络的输出值。

  • compute_cost函数:用于计算成本函数的值。

  • backward_propagation函数:用于进行反向传播计算,得到每层的梯度值。

  • update_weights函数:用于更新神经网络的权重值。

  • train函数:用于训练神经网络,并绘制成本函数随迭代次数的曲线。

以上代码通过调用这些函数,实现了一个简单的神经网络的训练过程,并演示了梯度消失问题的现象。

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/822350/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球