pytorch_lightning_spells.utils module

Classes:

EMATracker([alpha])

Keeps the exponential moving average for a single series.

Functions:

`count_parameters`(parameters)	Count the number of parameters
`freeze_layers`(layer_groups, freeze_flags)	Freeze or unfreeze groups of layers
`separate_parameters`(module[, skip_list])	Separate BatchNorm2d, GroupNorm, and LayerNorm paremeters from others
`set_trainable`(layer, trainable)	Freeze or unfreeze all parameters in the layer.

class pytorch_lightning_spells.utils.EMATracker(alpha=0.05)[source]

Bases: object

Keeps the exponential moving average for a single series.

Parameters:: alpha (float, optional) – the weight of the new value, by default 0.05

Examples

>>> tracker = EMATracker(0.1)
>>> tracker.update(1.)
>>> tracker.value
1.0
>>> tracker.update(2.)
>>> tracker.value # 1 * 0.9 + 2 * 0.1
1.1
>>> tracker.update(float('nan')) # this won't have any effect
>>> tracker.value
1.1

update(new_value)[source]

Adds a new value to the tracker.

It will ignore NaNs and raise a warning in those cases.

Parameters:: new_value (Union[float, torch.Tensor]) – the incoming value.

property value: The smoothed value.

pytorch_lightning_spells.utils.count_parameters(parameters)[source]

Count the number of parameters

Parameters:: parameters (Iterable[Union[torch.Tensor, Parameter]]) – parameters you want to count.
Returns:: the number of parameters counted.
Return type:: int

Example

>>> count_parameters([torch.rand(100), torch.rand(10)])
110
>>> count_parameters([torch.rand(100, 2), torch.rand(10, 3)])
230

pytorch_lightning_spells.utils.freeze_layers(layer_groups, freeze_flags)[source]

Freeze or unfreeze groups of layers

Parameters:

layer_groups (Sequence[Layer]) – the target lists of layers
freeze_flags (Sequence[bool]) – the corresponding trainable flags

Warning

The value in freeze_flag has the opposite meaning as in trainable of set_trainable.

Set True to freeze; False to unfreeze.

Examples

>>> model = nn.Sequential(nn.Linear(10, 100), nn.Linear(100, 1))
>>> freeze_layers([model[0], model[1]], [True, False])
>>> model[0].weight.requires_grad
False
>>> model[1].weight.requires_grad
True
>>> freeze_layers([model[0], model[1]], [False, True])
>>> model[0].weight.requires_grad
True
>>> model[1].weight.requires_grad
False

pytorch_lightning_spells.utils.separate_parameters(module, skip_list=('bias',))[source]

Separate BatchNorm2d, GroupNorm, and LayerNorm paremeters from others

Parameters:

module (Union[Parameter, nn.Module, List[nn.Module]]) – to be separated.
skip_list (Sequence[str])

Returns:

lists of decay and no-decay parameters.

Return type:

Tuple[List[Parameter], List[Parameter]]

Example

>>> model = nn.Sequential(nn.Linear(100, 10, bias=True), nn.BatchNorm1d(10))
>>> _ = nn.init.constant_(model[0].weight, 2.)
>>> _ = nn.init.constant_(model[0].bias, 1.)
>>> _ = nn.init.constant_(model[1].weight, 1.)
>>> _ = nn.init.constant_(model[1].bias, 1.)
>>> model[0].weight.data.sum().item()
2000.0
>>> model[0].bias.data.sum().item()
10.0
>>> model[1].weight.data.sum().item()
10.0
>>> model[1].bias.data.sum().item()
10.0
>>> decay, no_decay = separate_parameters(model) # separate the parameters
>>> np.sum([x.sum().detach().numpy() for x in decay]) # nn.Linear
np.float32(2000.0)
>>> np.sum([x.sum().detach().numpy() for x in no_decay]) # nn.BatchNorm1d
np.float32(30.0)
>>> optimizer = torch.optim.AdamW([{
...    "params": decay, "weight_decay": 0.1
... }, {
...    "params": no_decay, "weight_decay": 0
... }], lr = 1e-3)

pytorch_lightning_spells.utils.set_trainable(layer, trainable)[source]

Freeze or unfreeze all parameters in the layer.

Parameters:

layer (Union[torch.nn.Module, torch.nn.ModuleList]) – the target layer
trainable (bool) – True to unfreeze; False to freeze

Example

>>> model = nn.Sequential(nn.Linear(10, 100), nn.Linear(100, 1))
>>> model[0].weight.requires_grad
True
>>> set_trainable(model, False)
>>> model[0].weight.requires_grad
False
>>> set_trainable(model, True)
>>> model[0].weight.requires_grad
True