# 「解析」CosineLRScheduler 调度器

⚠️注意：在论文中，这个调度器被称为SGDR，但在实际使用中，它常常被称为cosine调度器。两者大题一致，实现差异很小。

## ; 1、CosineLRScheduler

from timm.scheduler.cosine_lr import CosineLRScheduler

CosineLRScheduler(  optimizer:Optimizer,  t_initial:int, t_mul:float=1.0, lr_min:float,
decay_rate:float=1.0, warmup_t, warmup_lr_init,warmup_prefix=False,
cycle_limit, t_in_epochs=True, noise_range_t=None, noise_pct,
noise_std=1.0, noise_seed=42, initialize=True) ::Scheduler


CosineLRScheduler 接受 optimizer 和一些超参数。我们将首先看看如何首先使用timm训练文档来使用cosineLR调度器训练模型，然后看看如何将此调度器用作自定义训练脚本的独立调度器。

### t_initial

The initial number of epochs。例如，50、100等

### t_mul

Defaults to 1.0. Updates the SGDR schedule annealing.

### decay_rate：衰减比例

When decay_rate > 0 and

### cycle_limit

SGDR 中的最大重启次数
The number of maximum restarts in SGDR.

### t_in_epochs

If set to False, the learning rates returned for epoch t are None.

### initialize 初始化

If set to True, then, the an attributes initial_lr is set to each param group. Defaults to True.

## 2、CosineAnnealingLR

⚠️Note:
that this only implements the cosine annealing part of SGDR, and not the restarts.

The full version ： CosineAnnealingWarmRestarts

Set the learning rate of each parameter group using a cosine annealing schedule, where η m a x η_{max}ηm a x ​ is set to the initial lr and T c u r T_{cur}T c u r ​ is the number of epochs since the last restart in SGDR:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T m a x π ) ) , T c u r ≠ ( 2 k + 1 ) T m a x ; η t + 1 = η t + 1 2 ( η m a x − η m i n ) ( 1 − c o s ( 1 T m a x π ) ) , T c u r = ( 2 k + 1 ) T m a x ; \eta_t = \eta_{min} + \frac{1}{2}(\eta_{max}-\eta_{min})\Big( 1+cos\big(\frac{T_{cur}}{T_{max}}\pi\big) \Big) , \qquad T_{cur}\neq(2k+1)T_{max}; \ \quad \ \eta_{t+1} = \eta_{t} + \frac{1}{2}(\eta_{max}-\eta_{min})\Big( 1-cos\big(\frac{1}{T_{max}}\pi\big) \Big) ,\qquad T_{cur}=(2k+1)T_{max};ηt ​=ηm i n ​+2 1 ​(ηm a x ​−ηm i n ​)(1 +c o s (T m a x ​T c u r ​​π)),T c u r ​​=(2 k +1 )T m a x ​;ηt +1 ​=ηt ​+2 1 ​(ηm a x ​−ηm i n ​)(1 −c o s (T m a x ​1 ​π)),T c u r ​=(2 k +1 )T m a x ​;

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T m a x π ) ) \eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min}) \Big( 1+cos\big( \frac{T_{cur}}{T_{max}}\pi \big) \Big)ηt ​=ηm i n ​+2 1 ​(ηm a x ​−ηm i n ​)(1 +c o s (T m a x ​T c u r ​​π))

from torch.optim import lr_scheduler

lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=- 1, verbose=False)

Parameters:
optimizer (Optimizer) – Wrapped optimizer.

T_max (int)       – Maximum number of iterations.

eta_min (float)   – Minimum learning rate. Default: 0.

last_epoch (int)  – The index of last epoch. Default: -1.

verbose (bool)    – If True, prints a message to stdout for each update. Default: False.


• get_last_lr()
Return last computed learning rate by current scheduler.

Parameters:
state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

• print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.

• state_dict()
Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.dict which is not the optimizer.

## 3、CosineAnnealingWarmRestarts

from torch.optim import lr_scheduler
lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=- 1, verbose=False)

Parameters
optimizer (Optimizer) – Wrapped optimizer.

T_0 (int) – Number of iterations for the first restart.

T_mult (int, optional) – A factor increases Ti after a restart. Default: 1.

eta_min (float, optional) – Minimum learning rate. Default: 0.

last_epoch (int, optional) – The index of last epoch. Default: -1.

verbose (bool) – If True, prints a message to stdout for each update. Default: False.



Set the learning rate of each parameter group using a cosine annealing schedule, where η m a x η_{max}ηm a x ​ is set to the initial lr, T c u r T_{cur}T c u r ​ is the number of epochs since the last restart and T i T_i T i ​ is the number of epochs between two warm restarts in SGDR:

η t = η m i n + 1 2 ( η m a x − η m i n ) ( 1 + c o s ( T c u r T i π ) ) \eta_t=\eta_{min}+\frac{1}{2}(\eta_{max}-\eta_{min}) \Big( 1+cos\big( \frac{T_{cur}}{T_{i}}\pi \big) \Big)ηt ​=ηm i n ​+2 1 ​(ηm a x ​−ηm i n ​)(1 +c o s (T i ​T c u r ​​π))

When T c u r = T i T_{cur}=T_i T c u r ​=T i ​, set η t = η m i n \eta_t=\eta_{min}ηt ​=ηm i n ​. When T c u r = 0 T_{cur}=0 T c u r ​=0 after restart, set η t = η m a x \eta_t=\eta_{max}ηt ​=ηm a x ​.

• get_last_lr()
Return last computed learning rate by current scheduler.

Parameters:
state_dict (dict) – scheduler state. Should be an object returned from a call to state_dict().

• print_lr(is_verbose, group, lr, epoch=None)
Display the current learning rate.

• state_dict()
Returns the state of the scheduler as a dict.

It contains an entry for every variable in self.dict which is not the optimizer.

• step(epoch=None)
Step could be called after every batch update

Step Example

""" called in an interleaved way. """

scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
for epoch in range(20):
scheduler.step()
scheduler.step(26)
scheduler.step()



Original: https://blog.csdn.net/ViatorSun/article/details/123529445
Author: ViatorSun
Author: ViatorSun
Title: 「解析」CosineLRScheduler 调度器

(0)

