API - Optimizers¶

TensorLayer provides rich layer implementations trailed for various benchmarks and domain-specific problems. In addition, we also support transparent access to native TensorFlow parameters. For example, we provide not only layers for local response normalization, but also layers that allow user to apply tf.ops.lrn on network.outputs. More functions can be found in TensorFlow API.

TensorLayerX provides simple API and tools to ease research, development and reduce the time to production. Therefore, we provide the latest state of the art optimizers that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch. The optimizers functions provided by Tensorflow, MindSpore, PaddlePaddle and PyTorch can be used in TensorLayerX. We have also wrapped the optimizers functions for each framework, which can be found in tensorlayerx.optimizers. In addition, we provide the latest state of Optimizers Dynamic Learning Rate that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch.

Optimizers List¶

`Adadelta`([lr, rho, eps, weight_decay, grad_clip])	Optimizer that implements the Adadelta algorithm.
`Adagrad`([lr, initial_accumulator, eps, …])	Optimizer that implements the Adagrad algorithm.
`Adam`([lr, beta_1, beta_2, eps, …])	Optimizer that implements the Adam algorithm.
`Adamax`([lr, beta_1, beta_2, eps, …])	Optimizer that implements the Adamax algorithm.
`Ftrl`([lr, lr_power, …])	Optimizer that implements the FTRL algorithm.
`Nadam`([lr, beta_1, beta_2, eps, …])	Optimizer that implements the NAdam algorithm.
`RMSprop`([lr, rho, momentum, eps, centered, …])	Optimizer that implements the RMSprop algorithm.
`SGD`([lr, momentum, weight_decay, grad_clip])	Gradient descent (with momentum) optimizer.
`Momentum`([lr, momentum, nesterov, …])	Optimizer that implements the Momentum algorithm.
`Lamb`()	Optimizer that implements the Layer-wise Adaptive Moments (LAMB).
`LARS`()	LARS is an optimization algorithm employing a large batch optimization technique.

Optimizers Dynamic Learning Rate List¶

`LRScheduler`([learning_rate, last_epoch, verbose])	LRScheduler Base class.
`StepDecay`(learning_rate, step_size[, gamma, …])	Update the learning rate of `optimizer` by `gamma` every `step_size` number of epoch.
`CosineAnnealingDecay`(learning_rate, T_max[, …])	Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate.
`NoamDecay`(d_model, warmup_steps[, …])	Applies Noam Decay to the initial learning rate.
`PiecewiseDecay`(boundaries, values[, …])	Piecewise learning rate scheduler.
`NaturalExpDecay`(learning_rate, gamma[, …])	Applies natural exponential decay to the initial learning rate.
`InverseTimeDecay`(learning_rate, gamma[, …])	Applies inverse time decay to the initial learning rate.
`PolynomialDecay`(learning_rate, decay_steps)	Applies polynomial decay to the initial learning rate.
`LinearWarmup`(learning_rate, warmup_steps, …)	Linear learning rate warm up strategy.
`ExponentialDecay`(learning_rate, gamma[, …])	Update learning rate by gamma each epoch.
`MultiStepDecay`(learning_rate, milestones[, …])	Update the learning rate by `gamma` once `epoch` reaches one of the milestones.
`LambdaDecay`(learning_rate, lr_lambda[, …])	Sets the learning rate of `optimizer` by function `lr_lambda` .
`ReduceOnPlateau`(learning_rate[, mode, …])	Reduce learning rate when `metrics` has stopped descending.

Adadelta¶

class tensorlayerx.optimizers.Adadelta(lr=0.001, rho=0.95, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the Adadelta algorithm. Equivalent to tf.optimizers.Adadelta.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adadelta?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
rho (float or constant float tensor) – A Tensor or a floating point value. The decay rate.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adadelta(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adagrad¶

class tensorlayerx.optimizers.Adagrad(lr=0.001, initial_accumulator=0.1, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the Adagrad algorithm. Equivalent to tf.optimizers.Adagrad.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adagrad?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
initial_accumulator_value (float) – Floating point value. Starting value for the accumulators (per-parameter momentum values). Must be non-negative.Defaults to 0.95.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adagrad(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adam¶

class tensorlayerx.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the Adam algorithm. Equivalent to tf.optimizers.Adam.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adam?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adam(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adamax¶

class tensorlayerx.optimizers.Adamax(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the Adamax algorithm. Equivalent to tf.optimizers.Adamax.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Adamax?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adamax(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Ftrl¶

class tensorlayerx.optimizers.Ftrl(lr=0.001, lr_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, beta=0.0, l2_shrinkage_regularization_strength=0.0, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the FTRL algorithm. Equivalent to tf.optimizers.Ftrl.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Ftrl?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
lr_power (float) – Controls how the learning rate decreases during training. Use zero for a fixed learning rate.
initial_accumulator_value (float) – The starting value for accumulators. Only zero or positive values are allowed.
l1_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.
l2_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.
l2_shrinkage_regularization_strength (float) – This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. When input is sparse shrinkage will only happen on the active weights.
beta (float) – A float value, representing the beta value from the paper. Defaults to 0.0.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Ftrl(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Nadam¶

class tensorlayerx.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the NAdam algorithm. Equivalent to tf.optimizers.Nadam.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/Nadam?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.
beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Nadam(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

RMSprop¶

class tensorlayerx.optimizers.RMSprop(lr=0.001, rho=0.9, momentum=0.0, eps=1e-07, centered=False, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the RMSprop algorithm. Equivalent to tf.optimizers.RMSprop.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/RMSprop?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
rho (float) – Discounting factor for the history/coming gradient. Defaults to 0.9.
momentum (float) – A scalar or a scalar Tensor. Defaults to 0.0.
eps (float) – A small constant for numerical stability.Defaults to 1e-7.
centered (bool) – If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.RMSprop(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

SGD¶

class tensorlayerx.optimizers.SGD(lr=0.01, momentum=0.0, weight_decay=0.0, grad_clip=None)[source]¶

Gradient descent (with momentum) optimizer. Equivalent to tf.optimizers.SGD.

References

https://tensorflow.google.cn/api_docs/python/tf/keras/optimizers/SGD?hl=en

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
momentum (float) – float hyperparameter >= 0 that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.SGD(0.01)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Momentum¶

class tensorlayerx.optimizers.Momentum(lr=0.01, momentum=0.0, nesterov=False, weight_decay=0.0, grad_clip=None)[source]¶

Optimizer that implements the Momentum algorithm. Equivalent to tf.compat.v1.train.MomentumOptimizer

References

https://tensorflow.google.cn/api_docs/python/tf/compat/v1/train/MomentumOptimizer?hl=en&version=nightly

Parameters

lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.
momentum (float) – A Tensor or a floating point value. The momentum. Defaults to 0
use_locking (bool) – If True use locks for update operations.
use_nesterov (bool) – If True use Nesterov Momentum. See (Sutskever et al., 2013). This implementation always computes gradients at the value of the variable(s) passed to the optimizer. Using Nesterov Momentum makes the variable(s) track the values called theta_t + mu*v_t in the paper. This implementation is an approximation of the original formula, valid for high values of momentum. It will compute the “adjusted gradient” in NAG by assuming that the new gradient will be estimated by the current average gradient plus the product of momentum and the change in the average gradient.
weight_decay (float) – weight decay (L2 penalty) (default: 0.0)
grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Momentum(0.01, momentum=0.9)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Lamb¶

class tensorlayerx.optimizers.Lamb[source]¶

Optimizer that implements the Layer-wise Adaptive Moments (LAMB).

References

https://tensorflow.google.cn/addons/api_docs/python/tfa/optimizers/LAMB?hl=en

LARS¶

class tensorlayerx.optimizers.LARS[source]¶

LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS.

References

https://www.mindspore.cn/docs/api/zh-CN/r1.5/api_python/nn/mindspore.nn.LARS.html?highlight=lars#mindspore.nn.LARS

LRScheduler¶

class tensorlayerx.optimizers.lr.LRScheduler(learning_rate=0.1, last_epoch=-1, verbose=False)[source]¶

LRScheduler Base class. Define the common interface of a learning rate scheduler.

User can import it by from tl.optimizer.lr import LRScheduler ,

then overload it for your subclass and have a custom implementation of get_lr() .

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/LRScheduler_cn.html

Parameters

learning_rate (A floating point value) – The learning rate. Defaults to 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> #Here is an example of a simple ``StepDecay`` implementation.
>>> import tensorlayerx as tlx
>>> from tensorlayerx.optimizers.lr import LRScheduler
>>> class StepDecay(LRScheduler):
>>>     def __init__(self, learning_rate, step_size, gamma = 0.1, last_epoch = -1, verbose=False):
>>>         if not isinstance(step_size, int):
>>>             raise TypeError("The type of 'step_size' must be 'int', but received %s." %type(step_size))
>>>         if gamma >= 1.0 :
>>>             raise ValueError('gamma should be < 1.0.')
>>>         self.step_size = step_size
>>>         self.gamma = gamma
>>>         super(StepDecay, self).__init__(learning_rate, last_epoch, verbose)
>>>     def get_lr(self):
>>>         i = self.last_epoch // self.step_size
>>>         return self.base_lr * (self.gamma**i)

StepDecay¶

class tensorlayerx.optimizers.lr.StepDecay(learning_rate, step_size, gamma=0.1, last_epoch=-1, verbose=False)[source]¶

Update the learning rate of optimizer by gamma every step_size number of epoch.

\[new\_learning\_rate = learning\_rate * gamma^{epoch // step_size}\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/StepDecay_cn.html

Parameters

learning_rate (float) – The learning rate.
step_size (int) – the interval to update.
gamma (float) – The Ratio that the learning rate will be reduced. new_lr = origin_lr * gamma . It should be less than 1.0. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.StepDecay(learning_rate = 0.1, step_size = 10,  gamma = 0.1, last_epoch = -1, verbose = False)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for batch in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each batch
>>>     #scheduler.step()    # If you update learning rate each epoch

CosineAnnealingDecay¶

class tensorlayerx.optimizers.lr.CosineAnnealingDecay(learning_rate, T_max, eta_min=0, last_epoch=-1, verbose=False)[source]¶

Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate. \(T_{cur}\) is the number of epochs since the last restart in SGDR.

\[\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/CosineAnnealingDecay_cn.html

Parameters

learning_rate (float or int) – The initial learning rate, that is \(\eta_{max}\) . It can be set to python float or int number.
T_max (int) – Maximum number of iterations. It is half of the decay cycle of learning rate.
eta_min (float or int) – Minimum learning rate, that is \(\eta_{min}\) . Default: 0.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.CosineAnnealingDecay(learning_rate = 0.1, T_max = 10, eta_min=0, last_epoch=-1, verbose=False)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

NoamDecay¶

class tensorlayerx.optimizers.lr.NoamDecay(d_model, warmup_steps, learning_rate=1.0, last_epoch=-1, verbose=False)[source]¶

Applies Noam Decay to the initial learning rate.

\[new\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(epoch^{-0.5}, epoch * warmup\_steps^{-1.5})\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/NoamDecay_cn.html
‘Attention is all you need’<https://arxiv.org/pdf/1706.03762.pdf>_

Parameters

d_model (int) – The dimensionality of input and output feature vector of model. It is a python int number.
warmup_steps (int) – The number of warmup steps. A super parameter. It is a python int number
learning_rate (float) – The initial learning rate. It is a python float number. Default: 1.0.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.NoamDecay(d_model=0.01, warmup_steps=100, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

PiecewiseDecay¶

class tensorlayerx.optimizers.lr.PiecewiseDecay(boundaries, values, last_epoch=-1, verbose=False)[source]¶

Piecewise learning rate scheduler.

boundaries = [100, 200]
values = [1.0, 0.5, 0.1]
if epoch < 100:
    learning_rate = 1.0
elif 100 <= global_step < 200:
    learning_rate = 0.5
else:
    learning_rate = 0.1

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/PiecewiseDecay_cn.html

Parameters

boundaries (list) – A list of steps numbers.
values (list) – A list of learning rate values that will be picked during different epoch boundaries.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.PiecewiseDecay(boundaries=[100, 200], values=[0.1, 0.5, 0.1], verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

NaturalExpDecay¶

class tensorlayerx.optimizers.lr.NaturalExpDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶

Applies natural exponential decay to the initial learning rate.

\[new\_learning\_rate = learning\_rate * e^{- gamma * epoch}\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/NaturalExpDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
gamma (float) – A Ratio to update the learning rate. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.NaturalExpDecay(learning_rate=0.1, gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

InverseTimeDecay¶

class tensorlayerx.optimizers.lr.InverseTimeDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶

Applies inverse time decay to the initial learning rate.

\[new\_learning\_rate = \frac{learning\_rate}{1 + gamma * epoch}\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/InverseTimeDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
gamma (float) – A Ratio to update the learning rate. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.InverseTimeDecay(learning_rate=0.1, gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

PolynomialDecay¶

class tensorlayerx.optimizers.lr.PolynomialDecay(learning_rate, decay_steps, end_lr=0.0001, power=1.0, cycle=False, last_epoch=-1, verbose=False)[source]¶

Applies polynomial decay to the initial learning rate.

If cycle is set to True, then:

\[ \begin{align}\begin{aligned}decay\_steps & = decay\_steps * math.ceil(\frac{epoch}{decay\_steps})\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]

If cycle is set to False, then:

\[ \begin{align}\begin{aligned}epoch & = min(epoch, decay\_steps)\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/PolynomialDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
decay_steps (int) – The decay step size. It determines the decay cycle.
end_lr (float) – The minimum final learning rate. Default: 0.0001.
power (float) – Power of polynomial. Default: 1.0.
cycle (bool) – Whether the learning rate rises again. If True, then the learning rate will rise when it decrease to end_lr . If False, the learning rate is monotone decreasing. Default: False.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.PolynomialDecay(learning_rate=0.1, decay_steps=50, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

LinearWarmup¶

class tensorlayerx.optimizers.lr.LinearWarmup(learning_rate, warmup_steps, start_lr, end_lr, last_epoch=-1, verbose=False)[source]¶

Linear learning rate warm up strategy. Update the learning rate preliminarily before the normal learning rate scheduler.

When epoch < warmup_steps, learning rate is updated as:

\[lr = start\_lr + (end\_lr - start\_lr) * \frac{epoch}{warmup\_steps}\]

where start_lr is the initial learning rate, and end_lr is the final learning rate;

When epoch >= warmup_steps, learning rate is updated as:

\[lr = learning_rate\]

where learning_rate is float or any subclass of LRScheduler .

References

Parameters

learning_rate (float) – The initial learning rate.
warmup_steps (int) – total steps of warm up.
start_lr (float) – Initial learning rate of warm up.
end_lr (float) – Final learning rate of warm up.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.LinearWarmup(learning_rate=0.1, warmup_steps=20, start_lr=0.0, end_lr=0.5, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

ExponentialDecay¶

class tensorlayerx.optimizers.lr.ExponentialDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]¶

Update learning rate by gamma each epoch.

When epoch < warmup_steps, learning rate is updated as:

\[new\_learning\_rate = last\_learning\_rate * gamma\]

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/ExponentialDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.ExponentialDecay(learning_rate=0.1, gamma=0.9, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

MultiStepDecay¶

class tensorlayerx.optimizers.lr.MultiStepDecay(learning_rate, milestones, gamma=0.1, last_epoch=-1, verbose=False)[source]¶

Update the learning rate by gamma once epoch reaches one of the milestones. The algorithm can be described as the code below.

learning_rate = 0.1
milestones = [50, 100]
gamma = 0.1
if epoch < 50:
    learning_rate = 0.1
elif epoch < 100:
    learning_rate = 0.01
else:
    learning_rate = 0.001

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/MultiStepDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
milestones (list) – List or tuple of each boundaries. Must be increasing.
gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.MultiStepDecay(learning_rate=0.1, milestones=[50, 100], gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

LambdaDecay¶

class tensorlayerx.optimizers.lr.LambdaDecay(learning_rate, lr_lambda, last_epoch=-1, verbose=False)[source]¶

Sets the learning rate of optimizer by function lr_lambda . lr_lambda is funciton which receives epoch .

The algorithm can be described as the code below.

learning_rate = 0.5        # init learning_rate
lr_lambda = lambda epoch: 0.95 ** epoch

learning_rate = 0.5        # epoch 0, 0.5*0.95**0
learning_rate = 0.475      # epoch 1, 0.5*0.95**1
learning_rate = 0.45125    # epoch 2, 0.5*0.95**2

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/LambdaDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
lr_lambda (function) – A function which computes a factor by epoch , and then multiply the initial learning rate by this factor.
last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.LambdaDecay(learning_rate=0.1, lr_lambda=lambda x:0.9**x, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

ReduceOnPlateau¶

class tensorlayerx.optimizers.lr.ReduceOnPlateau(learning_rate, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, epsilon=1e-08, verbose=False)[source]¶

Reduce learning rate when metrics has stopped descending. Models often benefit from reducing the learning rate by 2 to 10 times once model performance has no longer improvement.

The metrics is the one which has been pass into step , it must be 1-D Tensor with shape [1]. When metrics stop descending for a patience number of epochs, the learning rate will be reduced to learning_rate * factor . (Specially, mode can also be set to 'max , in this case, when metrics stop ascending for a patience number of epochs, the learning rate will be reduced.)

In addition, After each reduction, it will wait a cooldown number of epochs before resuming above operation.

References

https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/optimizer/lr/LambdaDecay_cn.html

Parameters

learning_rate (float) – The initial learning rate.
mode (str) –

'min' or 'max' can be selected. Normally, it is 'min' , which means that the learning rate will reduce when loss stops descending.
Specially, if it’s set to 'max' , the learning rate will reduce when loss stops ascending. Default: 'min' .
factor (float) – The Ratio that the learning rate will be reduced.It should be less than 1.0. Default: 0.1.
patience (int) – When loss doesn’t improve for this number of epochs, learing rate will be reduced. Default: 10.
threshold (float) – threshold and threshold_mode will determine the minimum change of loss . This make tiny changes of loss will be ignored. Default: 1e-4.
threshold_mode (str) – 'rel' or 'abs' can be selected. In 'rel' mode, the minimum change of loss is last_loss * threshold , where last_loss is loss in last epoch. In 'abs' mode, the minimum change of loss is threshold . Default: 'rel' .
cooldown (int) – The number of epochs to wait before resuming normal operation. Default: 0.
min_lr (float) – The lower bound of the learning rate after reduction. Default: 0.
epsilon (float) – Minimal decay applied to lr. If the difference between new and old lr is smaller than epsilon, the update is ignored. Default: 1e-8.
verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch