API - Optimizers

TensorLayerX provides simple API and tools to ease research, development and reduce the time to production. Therefore, we provide the latest state of the art optimizers that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch. The optimizers functions provided by Tensorflow, MindSpore, PaddlePaddle and PyTorch can be used in TensorLayerX. We have also wrapped the optimizers functions for each framework, which can be found in tensorlayerx.optimizers. In addition, we provide the latest state of Optimizers Dynamic Learning Rate that work with Tensorflow, MindSpore, PaddlePaddle and PyTorch.

Optimizers List

Adadelta([lr, rho, eps, weight_decay, grad_clip])

Optimizer that implements the Adadelta algorithm.

Adagrad([lr, initial_accumulator, eps, …])

Optimizer that implements the Adagrad algorithm.

Adam([lr, beta_1, beta_2, eps, …])

Optimizer that implements the Adam algorithm.

Adamax([lr, beta_1, beta_2, eps, …])

Optimizer that implements the Adamax algorithm.

Ftrl([lr, lr_power, …])

Optimizer that implements the FTRL algorithm.

Nadam([lr, beta_1, beta_2, eps, …])

Optimizer that implements the NAdam algorithm.

RMSprop([lr, rho, momentum, eps, centered, …])

Optimizer that implements the RMSprop algorithm.

SGD([lr, momentum, weight_decay, grad_clip])

Gradient descent (with momentum) optimizer.

Momentum([lr, momentum, nesterov, …])

Optimizer that implements the Momentum algorithm.

Lamb()

Optimizer that implements the Layer-wise Adaptive Moments (LAMB).

LARS()

LARS is an optimization algorithm employing a large batch optimization technique.

Optimizers Dynamic Learning Rate List

LRScheduler([learning_rate, last_epoch, verbose])

LRScheduler Base class.

StepDecay(learning_rate, step_size[, gamma, …])

Update the learning rate of optimizer by gamma every step_size number of epoch.

CosineAnnealingDecay(learning_rate, T_max[, …])

Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate.

NoamDecay(d_model, warmup_steps[, …])

Applies Noam Decay to the initial learning rate.

PiecewiseDecay(boundaries, values[, …])

Piecewise learning rate scheduler.

NaturalExpDecay(learning_rate, gamma[, …])

Applies natural exponential decay to the initial learning rate.

InverseTimeDecay(learning_rate, gamma[, …])

Applies inverse time decay to the initial learning rate.

PolynomialDecay(learning_rate, decay_steps)

Applies polynomial decay to the initial learning rate.

LinearWarmup(learning_rate, warmup_steps, …)

Linear learning rate warm up strategy.

ExponentialDecay(learning_rate, gamma[, …])

Update learning rate by gamma each epoch.

MultiStepDecay(learning_rate, milestones[, …])

Update the learning rate by gamma once epoch reaches one of the milestones.

LambdaDecay(learning_rate, lr_lambda[, …])

Sets the learning rate of optimizer by function lr_lambda .

ReduceOnPlateau(learning_rate[, mode, …])

Reduce learning rate when metrics has stopped descending.

Adadelta

class tensorlayerx.optimizers.Adadelta(lr=0.001, rho=0.95, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the Adadelta algorithm. Equivalent to tf.optimizers.Adadelta.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • rho (float or constant float tensor) – A Tensor or a floating point value. The decay rate.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adadelta(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adagrad

class tensorlayerx.optimizers.Adagrad(lr=0.001, initial_accumulator=0.1, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the Adagrad algorithm. Equivalent to tf.optimizers.Adagrad.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • initial_accumulator_value (float) – Floating point value. Starting value for the accumulators (per-parameter momentum values). Must be non-negative.Defaults to 0.95.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adagrad(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adam

class tensorlayerx.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the Adam algorithm. Equivalent to tf.optimizers.Adam.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.

  • beta_2 (float or constant float tensor) – The exponential decay rate for the 2nd moment estimates. Defaults to 0.999.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adam(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Adamax

class tensorlayerx.optimizers.Adamax(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the Adamax algorithm. Equivalent to tf.optimizers.Adamax.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.

  • beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Adamax(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Ftrl

class tensorlayerx.optimizers.Ftrl(lr=0.001, lr_power=-0.5, initial_accumulator_value=0.1, l1_regularization_strength=0.0, l2_regularization_strength=0.0, beta=0.0, l2_shrinkage_regularization_strength=0.0, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the FTRL algorithm. Equivalent to tf.optimizers.Ftrl.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • lr_power (float) – Controls how the learning rate decreases during training. Use zero for a fixed learning rate.

  • initial_accumulator_value (float) – The starting value for accumulators. Only zero or positive values are allowed.

  • l1_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.

  • l2_regularization_strength (float) – A float value, must be greater than or equal to zero. Defaults to 0.0.

  • l2_shrinkage_regularization_strength (float) – This differs from L2 above in that the L2 above is a stabilization penalty, whereas this L2 shrinkage is a magnitude penalty. When input is sparse shrinkage will only happen on the active weights.

  • beta (float) – A float value, representing the beta value from the paper. Defaults to 0.0.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Ftrl(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Nadam

class tensorlayerx.optimizers.Nadam(lr=0.001, beta_1=0.9, beta_2=0.999, eps=1e-07, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the NAdam algorithm. Equivalent to tf.optimizers.Nadam.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • beta_1 (float or constant float tensor) – The exponential decay rate for the 1st moment estimates. Defaults to 0.9.

  • beta_2 (float or constant float tensor) – The exponential decay rate for the exponentially weighted infinity norm. Defaults to 0.999.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Nadam(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

RMSprop

class tensorlayerx.optimizers.RMSprop(lr=0.001, rho=0.9, momentum=0.0, eps=1e-07, centered=False, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the RMSprop algorithm. Equivalent to tf.optimizers.RMSprop.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • rho (float) – Discounting factor for the history/coming gradient. Defaults to 0.9.

  • momentum (float) – A scalar or a scalar Tensor. Defaults to 0.0.

  • eps (float) – A small constant for numerical stability.Defaults to 1e-7.

  • centered (bool) – If True, gradients are normalized by the estimated variance of the gradient; if False, by the uncentered second moment. Setting this to True may help with training, but is slightly more expensive in terms of computation and memory. Defaults to False.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.RMSprop(0.001)
>>> optimizer.apply_gradients(zip(grad, train_weights))

SGD

class tensorlayerx.optimizers.SGD(lr=0.01, momentum=0.0, weight_decay=0.0, grad_clip=None)[source]

Gradient descent (with momentum) optimizer. Equivalent to tf.optimizers.SGD.

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • momentum (float) – float hyperparameter >= 0 that accelerates gradient descent in the relevant direction and dampens oscillations. Defaults to 0, i.e., vanilla gradient descent.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.SGD(0.01)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Momentum

class tensorlayerx.optimizers.Momentum(lr=0.01, momentum=0.0, nesterov=False, weight_decay=0.0, grad_clip=None)[source]

Optimizer that implements the Momentum algorithm. Equivalent to tf.compat.v1.train.MomentumOptimizer

References

Parameters
  • lr (A Tensor, floating point value) – The learning rate. Defaults to 0.001.

  • momentum (float) – A Tensor or a floating point value. The momentum. Defaults to 0

  • use_locking (bool) – If True use locks for update operations.

  • use_nesterov (bool) – If True use Nesterov Momentum. See (Sutskever et al., 2013). This implementation always computes gradients at the value of the variable(s) passed to the optimizer. Using Nesterov Momentum makes the variable(s) track the values called theta_t + mu*v_t in the paper. This implementation is an approximation of the original formula, valid for high values of momentum. It will compute the “adjusted gradient” in NAG by assuming that the new gradient will be estimated by the current average gradient plus the product of momentum and the change in the average gradient.

  • weight_decay (float) – weight decay (L2 penalty) (default: 0.0)

  • grad_clip (GradientClip or None) – Gradient cliping strategy.There are three cliping strategies ( tlx.ops.ClipGradByValue , tlx.ops.ClipGradByNorm, tlx.ops.ClipByGlobalNorm ). Default None, meaning there is no gradient clipping.

Examples

With TensorLayerx

>>> import tensorlayerx as tlx
>>> optimizer = tlx.optimizers.Momentum(0.01, momentum=0.9)
>>> optimizer.apply_gradients(zip(grad, train_weights))

Lamb

class tensorlayerx.optimizers.Lamb[source]

Optimizer that implements the Layer-wise Adaptive Moments (LAMB).

References

LARS

class tensorlayerx.optimizers.LARS[source]

LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS.

References

LRScheduler

class tensorlayerx.optimizers.lr.LRScheduler(learning_rate=0.1, last_epoch=-1, verbose=False)[source]

LRScheduler Base class. Define the common interface of a learning rate scheduler.

User can import it by from tlx.optimizer.lr import LRScheduler ,

then overload it for your subclass and have a custom implementation of get_lr() .

References

Parameters
  • learning_rate (A floating point value) – The learning rate. Defaults to 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> #Here is an example of a simple ``StepDecay`` implementation.
>>> import tensorlayerx as tlx
>>> from tensorlayerx.optimizers.lr import LRScheduler
>>> class StepDecay(LRScheduler):
>>>     def __init__(self, learning_rate, step_size, gamma = 0.1, last_epoch = -1, verbose=False):
>>>         if not isinstance(step_size, int):
>>>             raise TypeError("The type of 'step_size' must be 'int', but received %s." %type(step_size))
>>>         if gamma >= 1.0 :
>>>             raise ValueError('gamma should be < 1.0.')
>>>         self.step_size = step_size
>>>         self.gamma = gamma
>>>         super(StepDecay, self).__init__(learning_rate, last_epoch, verbose)
>>>     def get_lr(self):
>>>         i = self.last_epoch // self.step_size
>>>         return self.base_lr * (self.gamma**i)

StepDecay

class tensorlayerx.optimizers.lr.StepDecay(learning_rate, step_size, gamma=0.1, last_epoch=-1, verbose=False)[source]

Update the learning rate of optimizer by gamma every step_size number of epoch.

\[new\_learning\_rate = learning\_rate * gamma^{epoch // step_size}\]

References

Parameters
  • learning_rate (float) – The learning rate.

  • step_size (int) – the interval to update.

  • gamma (float) – The Ratio that the learning rate will be reduced. new_lr = origin_lr * gamma . It should be less than 1.0. Default: 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.StepDecay(learning_rate = 0.1, step_size = 10,  gamma = 0.1, last_epoch = -1, verbose = False)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for batch in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each batch
>>>     #scheduler.step()    # If you update learning rate each epoch

CosineAnnealingDecay

class tensorlayerx.optimizers.lr.CosineAnnealingDecay(learning_rate, T_max, eta_min=0, last_epoch=-1, verbose=False)[source]

Set the learning rate using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial learning_rate. \(T_{cur}\) is the number of epochs since the last restart in SGDR.

\[\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned}\]

References

Parameters
  • learning_rate (float or int) – The initial learning rate, that is \(\eta_{max}\) . It can be set to python float or int number.

  • T_max (int) – Maximum number of iterations. It is half of the decay cycle of learning rate.

  • eta_min (float or int) – Minimum learning rate, that is \(\eta_{min}\) . Default: 0.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.CosineAnnealingDecay(learning_rate = 0.1, T_max = 10, eta_min=0, last_epoch=-1, verbose=False)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

NoamDecay

class tensorlayerx.optimizers.lr.NoamDecay(d_model, warmup_steps, learning_rate=1.0, last_epoch=-1, verbose=False)[source]

Applies Noam Decay to the initial learning rate.

\[new\_learning\_rate = learning\_rate * d_{model}^{-0.5} * min(epoch^{-0.5}, epoch * warmup\_steps^{-1.5})\]

References

Parameters
  • d_model (int) – The dimensionality of input and output feature vector of model. It is a python int number.

  • warmup_steps (int) – The number of warmup steps. A super parameter. It is a python int number

  • learning_rate (float) – The initial learning rate. It is a python float number. Default: 1.0.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.NoamDecay(d_model=0.01, warmup_steps=100, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

PiecewiseDecay

class tensorlayerx.optimizers.lr.PiecewiseDecay(boundaries, values, last_epoch=-1, verbose=False)[source]

Piecewise learning rate scheduler.

boundaries = [100, 200]
values = [1.0, 0.5, 0.1]
if epoch < 100:
    learning_rate = 1.0
elif 100 <= global_step < 200:
    learning_rate = 0.5
else:
    learning_rate = 0.1

References

Parameters
  • boundaries (list) – A list of steps numbers.

  • values (list) – A list of learning rate values that will be picked during different epoch boundaries.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.PiecewiseDecay(boundaries=[100, 200], values=[0.1, 0.5, 0.1], verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

NaturalExpDecay

class tensorlayerx.optimizers.lr.NaturalExpDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]

Applies natural exponential decay to the initial learning rate.

\[new\_learning\_rate = learning\_rate * e^{- gamma * epoch}\]

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • gamma (float) – A Ratio to update the learning rate. Default: 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.NaturalExpDecay(learning_rate=0.1, gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

InverseTimeDecay

class tensorlayerx.optimizers.lr.InverseTimeDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]

Applies inverse time decay to the initial learning rate.

\[new\_learning\_rate = \frac{learning\_rate}{1 + gamma * epoch}\]

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • gamma (float) – A Ratio to update the learning rate. Default: 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.InverseTimeDecay(learning_rate=0.1, gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

PolynomialDecay

class tensorlayerx.optimizers.lr.PolynomialDecay(learning_rate, decay_steps, end_lr=0.0001, power=1.0, cycle=False, last_epoch=-1, verbose=False)[source]

Applies polynomial decay to the initial learning rate.

If cycle is set to True, then:

\[ \begin{align}\begin{aligned}decay\_steps & = decay\_steps * math.ceil(\frac{epoch}{decay\_steps})\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]

If cycle is set to False, then:

\[ \begin{align}\begin{aligned}epoch & = min(epoch, decay\_steps)\\new\_learning\_rate & = (learning\_rate-end\_lr)*(1-\frac{epoch}{decay\_steps})^{power}+end\_lr\end{aligned}\end{align} \]

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • decay_steps (int) – The decay step size. It determines the decay cycle.

  • end_lr (float) – The minimum final learning rate. Default: 0.0001.

  • power (float) – Power of polynomial. Default: 1.0.

  • cycle (bool) – Whether the learning rate rises again. If True, then the learning rate will rise when it decrease to end_lr . If False, the learning rate is monotone decreasing. Default: False.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.PolynomialDecay(learning_rate=0.1, decay_steps=50, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

LinearWarmup

class tensorlayerx.optimizers.lr.LinearWarmup(learning_rate, warmup_steps, start_lr, end_lr, last_epoch=-1, verbose=False)[source]

Linear learning rate warm up strategy. Update the learning rate preliminarily before the normal learning rate scheduler.

When epoch < warmup_steps, learning rate is updated as:

\[lr = start\_lr + (end\_lr - start\_lr) * \frac{epoch}{warmup\_steps}\]

where start_lr is the initial learning rate, and end_lr is the final learning rate;

When epoch >= warmup_steps, learning rate is updated as:

\[lr = learning_rate\]

where learning_rate is float or any subclass of LRScheduler .

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • warmup_steps (int) – total steps of warm up.

  • start_lr (float) – Initial learning rate of warm up.

  • end_lr (float) – Final learning rate of warm up.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.LinearWarmup(learning_rate=0.1, warmup_steps=20, start_lr=0.0, end_lr=0.5, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

ExponentialDecay

class tensorlayerx.optimizers.lr.ExponentialDecay(learning_rate, gamma, last_epoch=-1, verbose=False)[source]

Update learning rate by gamma each epoch.

When epoch < warmup_steps, learning rate is updated as:

\[new\_learning\_rate = last\_learning\_rate * gamma\]

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.ExponentialDecay(learning_rate=0.1, gamma=0.9, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

MultiStepDecay

class tensorlayerx.optimizers.lr.MultiStepDecay(learning_rate, milestones, gamma=0.1, last_epoch=-1, verbose=False)[source]

Update the learning rate by gamma once epoch reaches one of the milestones. The algorithm can be described as the code below.

learning_rate = 0.1
milestones = [50, 100]
gamma = 0.1
if epoch < 50:
    learning_rate = 0.1
elif epoch < 100:
    learning_rate = 0.01
else:
    learning_rate = 0.001

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • milestones (list) – List or tuple of each boundaries. Must be increasing.

  • gamma (float) – The Ratio that the learning rate will be reduced. It should be less than 1.0. Default: 0.1.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.MultiStepDecay(learning_rate=0.1, milestones=[50, 100], gamma=0.1, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

LambdaDecay

class tensorlayerx.optimizers.lr.LambdaDecay(learning_rate, lr_lambda, last_epoch=-1, verbose=False)[source]

Sets the learning rate of optimizer by function lr_lambda . lr_lambda is funciton which receives epoch .

The algorithm can be described as the code below.

learning_rate = 0.5        # init learning_rate
lr_lambda = lambda epoch: 0.95 ** epoch

learning_rate = 0.5        # epoch 0, 0.5*0.95**0
learning_rate = 0.475      # epoch 1, 0.5*0.95**1
learning_rate = 0.45125    # epoch 2, 0.5*0.95**2

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • lr_lambda (function) – A function which computes a factor by epoch , and then multiply the initial learning rate by this factor.

  • last_epoch (int) – The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.LambdaDecay(learning_rate=0.1, lr_lambda=lambda x:0.9**x, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch

ReduceOnPlateau

class tensorlayerx.optimizers.lr.ReduceOnPlateau(learning_rate, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, epsilon=1e-08, verbose=False)[source]

Reduce learning rate when metrics has stopped descending. Models often benefit from reducing the learning rate by 2 to 10 times once model performance has no longer improvement.

The metrics is the one which has been pass into step , it must be 1-D Tensor with shape [1]. When metrics stop descending for a patience number of epochs, the learning rate will be reduced to learning_rate * factor . (Specially, mode can also be set to 'max , in this case, when metrics stop ascending for a patience number of epochs, the learning rate will be reduced.)

In addition, After each reduction, it will wait a cooldown number of epochs before resuming above operation.

References

Parameters
  • learning_rate (float) – The initial learning rate.

  • mode (str) –

    'min' or 'max' can be selected. Normally, it is 'min' , which means that the learning rate will reduce when loss stops descending.

    Specially, if it’s set to 'max' , the learning rate will reduce when loss stops ascending. Default: 'min' .

  • factor (float) – The Ratio that the learning rate will be reduced.It should be less than 1.0. Default: 0.1.

  • patience (int) – When loss doesn’t improve for this number of epochs, learing rate will be reduced. Default: 10.

  • threshold (float) – threshold and threshold_mode will determine the minimum change of loss . This make tiny changes of loss will be ignored. Default: 1e-4.

  • threshold_mode (str) – 'rel' or 'abs' can be selected. In 'rel' mode, the minimum change of loss is last_loss * threshold , where last_loss is loss in last epoch. In 'abs' mode, the minimum change of loss is threshold . Default: 'rel' .

  • cooldown (int) – The number of epochs to wait before resuming normal operation. Default: 0.

  • min_lr (float) – The lower bound of the learning rate after reduction. Default: 0.

  • epsilon (float) – Minimal decay applied to lr. If the difference between new and old lr is smaller than epsilon, the update is ignored. Default: 1e-8.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False .

Examples

With TensorLayerX

>>> import tensorlayerx as tlx
>>> scheduler = tlx.optimizers.lr.ReduceOnPlateau(learning_rate=1.0, factor=0.5, patience=5, verbose=True)
>>> sgd = tlx.optimizers.SGD(learning_rate=scheduler,momentum=0.2)
>>> for epoch in range(100):
>>>     for step in range(100):
>>>        # train model
>>>         scheduler.step() # If you update learning rate each step
>>>     #scheduler.step()    # If you update learning rate each epoch