深度学习中的梯度下降算法有哪些，它们之间有什么区别

2024年1月1日上午5:23 • 人工智能 • 阅读 38

梯度下降算法简介

梯度下降算法是深度学习中最为重要的优化算法之一，它被广泛用于神经网络的训练中。梯度下降算法通过寻找损失函数的最小值点来优化模型的参数。本文将介绍三种不同的梯度下降算法：批梯度下降法（Batch Gradient Descent）、随机梯度下降法（Stochastic Gradient Descent）和小批量梯度下降法（Mini-batch Gradient Descent）。

批梯度下降法 (Batch Gradient Descent)

批梯度下降法是最基本的梯度下降算法。它在每一次迭代时使用全部训练样本来计算损失函数关于参数的梯度，并更新参数。其具体算法原理如下：

初始化参数： $ \theta = \theta_{0} $
重复以下步骤直到收敛：

(a) 对于每个训练样本 $ (x^{(i)}, y^{(i)}) $，计算预测值 $ \hat{y}^{(i)} $

(b) 计算损失函数关于参数的偏导数 $ \dfrac{\partial J}{\partial \theta} $

批梯度下降法计算梯度时需要遍历全部的训练数据，因此每个迭代步骤的计算量较大。优点是可以保证每次迭代都在最优解方向上前进，收敛速度相对较快。然而，对于大数据集而言，批梯度下降法会消耗大量的计算资源，难以处理。

随机梯度下降法 (Stochastic Gradient Descent)

随机梯度下降法是相对于批梯度下降法的一种改进。相比于每次迭代都使用全部训练样本来计算梯度，随机梯度下降法每次迭代只使用一个样本来计算梯度并更新参数。具体算法原理如下：

初始化参数： $ \theta = \theta_{0} $
重复以下步骤直到收敛：

(a) 随机选取一个训练样本 $ (x^{(i)}, y^{(i)}) $

(b) 计算预测值 $ \hat{y}^{(i)} $

(d) 更新参数： $ \theta = \theta – \alpha \dfrac{\partial J^{(i)}}{\partial \theta} $

与批梯度下降法不同的是，随机梯度下降法在每次迭代时只需要处理一个样本，因此计算量较小。然而，由于每次迭代只考虑一个样本，随机梯度下降法的梯度估计可能带有较大的方差，收敛性较差，可能会陷入局部最优解。

小批量梯度下降法 (Mini-batch Gradient Descent)

小批量梯度下降法是批梯度下降法和随机梯度下降法的结合。在每次迭代时，小批量梯度下降法使用一小部分（大小为batch_size）的训练样本来计算梯度。具体算法原理如下：

初始化参数： $ \theta = \theta_{0} $
重复以下步骤直到收敛：

(a) 随机选择一个大小为batch_size的训练样本批次 $ {(x^{(i)}, y^{(i)})} $

(b) 计算预测值 $ \hat{y}^{(i)} $ （对于每个样本）

(d) 更新参数： $ \theta = \theta – \alpha \dfrac{\partial J}{\partial \theta} $

小批量梯度下降法通过每次迭代只计算一小部分样本的梯度，既能减少计算量，又能减小梯度估计的方差，因此在实际应用中被广泛采用。

梯度下降算法之间的区别

批梯度下降法、随机梯度下降法和小批量梯度下降法是梯度下降算法的三种不同变体。它们在计算梯度的方式上有所不同，批梯度下降法使用全部训练样本，随机梯度下降法使用单个样本，小批量梯度下降法使用一小批训练样本。

批梯度下降法对于大数据集计算量较大，但收敛速度相对较快。随机梯度下降法计算量小，但收敛速度可能较慢，也可能陷入局部最优解。小批量梯度下降法在计算量和收敛速度间取得了平衡，因此被广泛使用。

使用批梯度下降法可以得到最优解，但代价是计算时间较长。使用随机梯度下降法和小批量梯度下降法计算时间较短，但最终得到的结果可能略有不同。因此，在实际应用中可以根据具体情况选择不同的梯度下降算法。

Python代码示例

下面是一个使用小批量梯度下降法训练神经网络的Python代码示例。假设我们要训练一个二分类网络来进行图像分类。

import numpy as np
import matplotlib.pyplot as plt

# 定义神经网络的结构和参数
input_size = 2
hidden_size = 4
output_size = 1
batch_size = 4
learning_rate = 0.1

# 生成虚拟数据集
np.random.seed(0)
X = np.random.rand(10, 2)
y = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 1]).reshape(-1, 1)

# 初始化参数
W1 = np.random.randn(input_size, hidden_size)
b1 = np.zeros((1, hidden_size))
W2 = np.random.randn(hidden_size, output_size)
b2 = np.zeros((1, output_size))

# 迭代训练
for epoch in range(1000):
 # 随机打乱数据集
 permutation = np.random.permutation(X.shape[0])
 X = X[permutation]
 y = y[permutation]

 for i in range(0, X.shape[0], batch_size):
 # 前向传播
 X_batch = X[i:i+batch_size]
 y_batch = y[i:i+batch_size]

 hidden = np.dot(X_batch, W1) + b1
 hidden_activation = 1 / (1 + np.exp(-hidden))
 output = np.dot(hidden_activation, W2) + b2
 output_activation = 1 / (1 + np.exp(-output))

 # 计算损失函数
 loss = np.mean((output_activation - y_batch) artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls 2)

 # 反向传播
 dLdoutput_activation = 2 artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (output_activation - y_batch)
 doutput_activation_doutput = output_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - output_activation)
 doutput_dW2 = hidden_activation
 doutput_db2 = 1
 doutput_dhidden_activation = W2
 dhidden_activation_dhidden = hidden_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls (1 - hidden_activation)
 dhidden_dW1 = X_batch
 dhidden_db1 = 1

 dLdW2 = np.dot(doutput_dW2.T, dLdoutput_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls doutput_activation_doutput)
 dLdb2 = np.sum(dLdoutput_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls doutput_activation_doutput, axis=0, keepdims=True)
 dLdhidden_activation = np.dot(dLdoutput_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls doutput_activation_doutput, doutput_dhidden_activation.T)
 dLdhidden = dLdhidden_activation artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dhidden_activation_dhidden
 dLdW1 = np.dot(dhidden_dW1.T, dLdhidden)
 dLdb1 = np.sum(dLdhidden, axis=0, keepdims=True)

 # 更新参数
 W2 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dLdW2
 b2 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dLdb2
 W1 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dLdW1
 b1 -= learning_rate artical cgpt2md_gpt.sh cgpt2md_johngo.log cgpt2md_johngo.sh cgpt2md.sh _content1.txt _content.txt current_url.txt history_url history_urls log nohup.out online pic.txt seo test.py topic_gpt.txt topic_johngo.txt topic.txt upload-markdown-to-wordpress.py urls dLdb1

 # 每迭代100次打印损失函数
 if (epoch + 1) % 100 == 0:
 print(f"Epoch: {epoch+1}, Loss: {loss}")

# 绘制分类结果
h = 0.02
x_min, x_max = X[:, 0].min() - 0.1, X[:, 0].max() + 0.1
y_min, y_max = X[:, 1].min() - 0.1, X[:, 1].max() + 0.1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = np.dot(np.maximum(0, np.dot(np.c_[xx.ravel(), yy.ravel()], W1) + b1), W2) + b2
Z = np.round(1 / (1 + np.exp(-Z))).reshape(xx.shape)

plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y.ravel(), cmap=plt.cm.Spectral)
plt.xlabel('x1')
plt.ylabel('x2')
plt.title('Classification Boundary')
plt.show()

以上代码使用小批量梯度下降法训练一个简单的二分类神经网络，并绘制了分类结果。代码中使用了NumPy进行矩阵运算，matplotlib用于绘图。通过不断迭代调整参数，神经网络逐渐学习到了分类边界。其中的参数更新步骤对应着小批量梯度下降法的计算过程。

希望以上解答对您有所帮助！如果还有任何问题，请随时提问。

原创文章受到原创版权保护。转载请注明出处：https://www.johngo689.com/822385/

转载文章受原作者版权保护。转载请注明原作者出处！

人工智能

【自取】最近整理的，有需要可以领取学习：

Linux核心资料大放送~

全栈面试题汇总（持续更新&可下载）

一个提高学习100%效率的工具！

【超详细】深度学习面试题目！

LeetCode Python刷题答案下载！

LeetCode Java版刷题答案下载！

LeetCode C++ 版本，抓紧保存！

LeetCode GO语言刷题答案下载！

计算机视觉（角点检测）- 1 – Harris角点检测

计算机视觉（角点检测）- 1 – Harris角点检测学习前言一、Harris角点检测 * 1、什么是角点？ 2、Harris角点检测的基本原理&基本思想 …

人工智能 2023年6月19日
0090
质量评估指标：PSNR（Peak signal-to-noise ratio 峰值信噪比）

文章目录一、峰值信噪比二、定义三、质量评估和性能比较 * – 1、质量评估 2、性能比较四、PNSR 代码一、峰值信噪比峰值信噪比( PSNR ) 用于表示…

人工智能 2023年6月13日
0094
【数据挖掘】视觉模式挖掘：Hog特征+余弦相似度/k-means聚类

实验概述本次实验使用的是VOC2012数据集，首先从图像中随机采样图像块，然后利用Hog方法提取图像块特征，最后采用余弦相似度和k-means聚类两种方法来挖掘视觉模式。数据集…

人工智能 2023年7月17日
0067
图像分割技术

1 图像分割定义和方法分类图像分割就是指把图像分成各具特性的区域并提取出感兴趣目标的技术和过程。这里特性可以是灰度、颜色、纹理等，目标可以对应单个区域，也可以对应多个区域。图像分…

人工智能 2023年6月22日
0078
支持向量机

文章目录一、Support Vector Machine 二、距离的计算三、数据标签定义四、优化的目标五、目标函数六、拉格朗日乘子法七、SVM求解八、soft-mar…

人工智能 2023年7月3日
00110
全面解析PaDiM

使用PaDiM网络跑自己的数据集，除去测试时读入dataloader的时间，每张图片测试时间在20-30ms，精度比较高，图像分类准确率99-100，像素分割准确率97以上，但是最…

人工智能 2023年7月28日
0091
day9-机器学习之pandas库学习

1、pandas是python的一个库，主要功能是处理数据，是各种机器学习比赛最常用的库（没有之一）。 2、使用方法： import pandas as pd，前提得安装了这个库…

人工智能 2023年7月8日
0094
更换目标检测的backbone（以Faster RCNN为例）

本博客以Faster RCNN为例，介绍如何更换目标检测的backbone。对于更换目标检测backbone，主要难点是： 如何&amp…

人工智能 2023年7月21日
0067
利用python进行数据分析笔记

一、第一章：准备工作 1、引入惯例 Python社区已经广泛采取了一些常用模块的命名惯例： import numpy as np import matplotlib.pyplot …

人工智能 2023年7月16日
0065
【youcans 的图像处理学习课】11. 形态学图像处理（下）

专栏地址：『youcans 的图像处理学习课』文章目录：『youcans 的图像处理学习课 – 总目录』【youcans 的图像处理学习课】11. 形态学图像处理（上）…

人工智能 2023年6月20日
0091
解决使用matplotlib.pyplot画图包含中文乱码显示问题（macbook上 family ‘sans-serif‘ not found 问题）

一、matplotlib画图中文乱码问题使用 matplotlib.pyplot画图，有中文字体会显示乱码问题，这时需要添加如下代码： import matplotlib.pyp…

人工智能 2023年7月30日
00113
关联规则 FP-Growth小结

关联规则主要是为了解决经典的{啤酒}+{尿布}的组合问题，主要用于推荐商品组合以及为备货采购等提供依据支持度(A->B): A和B同时出现在整个数据集中的次数/数据集的总数，…

人工智能 2023年7月17日
0044
优序图权重计算指标解读

一、应用问卷研究中，如果涉及计算权重，通常有以下几种做法，分别是AHP层次法、优序图法、熵值法、因子分析法（探索性因子和验证性因子分析）。二、操作 1.SPSSAU操作如下图：…

人工智能 2023年7月18日
0082
【Pandas总结】第七节 Pandas 合并数据集_pd.concat()

文章目录 pd.concat() 的用法 * 一、数据准备：二、上下堆叠合并 axis=0 三、左右拼接合并 axis=1 四、inner 与 outer 对比写在最后将不同…

人工智能 2023年7月8日
0089
使用Python,OpenCV进行模板匹配单对象、多对象及多尺度模板匹配

使用Python,OpenCV进行模板匹配单对象、多对象及多尺度模板匹配 * – 1. 效果图 – + 1.1 模板匹配单对象 + 1.2 模板匹配多对象 …

人工智能 2023年6月22日
0094
python深度学习之基于LSTM时间序列的股票价格预测

1.本文是一篇LSTM处理时间序列的案例我们先来看看数据集，这里包含了一只股票的开盘价，最高价，最低价，收盘价，交易量的信息。本文基于LSTM对收盘价（close）进行预测 ; …

人工智能 2023年7月6日
0058

2024 年 5 月
一	二	三	四	五	六	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31