torch.optim.Adam() 函数用法



Adam是通过梯度的一阶矩和二阶矩自适应的控制每个参数的学习率的大小。

torch.optim.Adam() 函数用法

adam的初始化

    def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8,
                 weight_decay=0, amsgrad=False):
Args:
    params (iterable): iterable of parameters to optimize or dicts defining
        parameter groups
    lr (float, optional): learning rate (default: 1e-3)
    betas (Tuple[float, float], optional): coefficients used for computing
        running averages of gradient and its square (default: (0.9, 0.999))
    eps (float, optional): term added to the denominator to improve
        numerical stability (default: 1e-8)
    weight_decay (float, optional): weight decay (

Original: https://blog.csdn.net/qq_40107571/article/details/126018026
Author: Mick..
Title: torch.optim.Adam() 函数用法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/652271/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球