1. 什么是深度学习算法中的梯度消失问题?
在深度学习中,梯度消失问题是指在反向传播过程中,当深度神经网络的层数较多时,后面的层次很难获取到前面层次的有效梯度,使得模型的训练变得困难甚至无法收敛。这导致深度神经网络在训练过程中表现不佳,无法很好地学习到高层次的特征。
2. 深度学习算法原理
深度学习算法通过神经网络模拟人脑的工作方式,通过多个层次的神经元来处理复杂的输入数据。在进行训练时,使用反向传播算法来更新神经网络的权重值,使得网络能够逐步学习并提取出输入数据的有用特征。
3. 梯度消失问题的原理
梯度消失问题在神经网络的反向传播过程中产生。在反向传播中,梯度值代表了网络参数对误差的敏感程度,用于更新权重值。在深度神经网络中,梯度通过链式法则一层一层地传播回输入层,而且每一层都需要计算激活函数的导数。
在梯度传播的过程中,如果激活函数选择了sigmoid函数,当梯度从输出层向输入层传播时,由于sigmoid函数的导数在0附近取值较小,导致梯度值逐渐减小,甚至趋于0。这样,靠近输入层的权重更新速度非常缓慢,可能不会发生实质性的变化,导致模型无法很好地学习到高层次的特征。
4. 梯度消失问题的数学表达与计算步骤
数学表达:
考虑一个具有L层的神经网络,其中第l层的激活函数为sigmoid函数,第l层的输出为a^l,第l层的输入为z^l,则梯度传播过程中的梯度值为:$$\delta ^ l = \delta^{l+1} \cdot \sigma'(z^l) \cdot w^{l+1}$$,其中,$$\sigma’$$表示sigmoid函数的导数。
计算步骤:
- 初始化神经网络的权重值和偏置项。
- 前向传播计算预测值,计算损失函数的值。
- 使用反向传播计算输出层的梯度值。
- 逐层向前计算每层的梯度值。
5. 梯度消失问题的Python代码示例
下面以一个包含多层的神经网络为例,使用Python代码演示梯度消失问题。
import numpy as np
import matplotlib.pyplot as plt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return sigmoid(x) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - sigmoid(x))
def initialize_weights(layers):
weights = {}
for i in range(1, len(layers)):
weights['W' + str(i)] = np.random.randn(layers[i], layers[i-1]) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.01
weights['b' + str(i)] = np.zeros((layers[i], 1))
return weights
def forward_propagation(X, weights):
cache = {'A0':X}
L = len(weights) // 2
for i in range(1, L + 1):
Z = np.dot(weights['W' + str(i)], cache['A' + str(i-1)]) + weights['b' + str(i)]
cache['A' + str(i)] = sigmoid(Z)
return cache['A' + str(L)], cache
def compute_cost(AL, Y):
m = Y.shape[1]
cost = -1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(Y artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(AL) + (1 - Y) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.log(1 - AL))
return cost
def backward_propagation(AL, Y, cache, weights):
grads = {}
L = len(cache) // 2
dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL))
dZ = dAL artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(L)])
grads['dW' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(L-1)].T)
grads['db' + str(L)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
for i in range(L-1, 0, -1):
dA_prev = np.dot(weights['W' + str(i+1)].T, dZ)
dZ = dA_prev artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls sigmoid_derivative(cache['A' + str(i)])
grads['dW' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.dot(dZ, cache['A' + str(i-1)].T)
grads['db' + str(i)] = 1/m artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sum(dZ, axis=1, keepdims=True)
return grads
def update_weights(weights, grads, learning_rate):
L = len(weights) // 2
for i in range(L):
weights['W' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['dW' + str(i+1)]
weights['b' + str(i+1)] -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls grads['db' + str(i+1)]
return weights
def train(X, Y, layers, learning_rate, num_iterations):
costs = []
weights = initialize_weights(layers)
for i in range(num_iterations):
AL, cache = forward_propagation(X, weights)
cost = compute_cost(AL, Y)
grads = backward_propagation(AL, Y, cache, weights)
weights = update_weights(weights, grads, learning_rate)
if i % 100 == 0:
costs.append(cost)
print("Cost after iteration {}: {:.4f}".format(i, cost))
plt.plot(costs)
plt.xlabel("Iterations (per 100s)")
plt.ylabel("Cost")
plt.title("Learning rate =" + str(learning_rate))
plt.show()
return weights
# 生成虚拟数据集
np.random.seed(0)
m = 1000
N = int(m / 2)
D = 2
X = np.zeros((m, D))
Y = np.zeros((m, 1))
a = 4
for j in range(2):
ix = range(N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls j, N artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (j + 1))
t = np.linspace(j artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, (j + 1) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 3.12, N) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.2
r = a artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(4 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls t) + np.random.randn(N) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 0.1
X[ix] = np.c_[r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.sin(t), r artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls np.cos(t)]
Y[ix] = j
layers = [2, 10, 10, 10, 1] # 设置神经网络的层数和每层神经元的个数
learning_rate = 0.01
num_iterations = 1000
weights = train(X.T, Y.T, layers, learning_rate, num_iterations)
这段代码演示了一个4层神经网络的训练过程,在每次迭代后计算成本并绘制成本随迭代次数的变化曲线。在梯度消失问题中,我们会观察到成本几乎没有变化,神经网络无法正确学习输入数据的特征。
6. 代码细节解释
-
sigmoid
函数和sigmoid_derivative
函数:用于计算sigmoid函数和其导数值。 -
initialize_weights
函数:用于初始化神经网络的权重矩阵和偏置项。 -
forward_propagation
函数:用于进行前向传播计算,得到神经网络的输出值。 -
compute_cost
函数:用于计算成本函数的值。 -
backward_propagation
函数:用于进行反向传播计算,得到每层的梯度值。 -
update_weights
函数:用于更新神经网络的权重值。 -
train
函数:用于训练神经网络,并绘制成本函数随迭代次数的曲线。
以上代码通过调用这些函数,实现了一个简单的神经网络的训练过程,并演示了梯度消失问题的现象。
原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/822350/
转载文章受原作者版权保护。转载请注明原作者出处!