Modules¶

Modules are objects which take Tensor(s) as input, perform some computation on that Tensor, and output a Tensor. Modules can create and contain Parameters. For example, neural network layers are good examples of a Module, since they store parameters, and use those parameters to perform a computation (the forward pass of the data through the layer).

Module - abstract base class for all modules
Dense - fully-connected neural network layer
DenseNetwork - a multi-layer dense neural network module
Sequential - apply a list of modules sequentially
BatchNormalization - normalize data per batch
Embedding - embed categorical data in a lower-dimensional space

class probflow.modules.Module(*args)[source]¶

Bases: probflow.utils.base.BaseModule

Abstract base class for Modules.

TODO

property parameters¶: A list of Parameters in this Module and its sub-Modules.

property modules¶: A list of sub-Modules in this Module, including itself.

property trainable_variables¶: A list of trainable backend variables within this Module

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

bayesian_update()[source]¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

kl_loss()[source]¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()[source]¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

reset_kl_loss()[source]¶: Reset additional loss due to KL divergences

add_kl_loss(loss, d2=None)[source]¶: Add additional loss due to KL divergences.

dumps()[source]¶: Serialize module object to bytes

save(filename: str)[source]¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

class probflow.modules.Dense(d_in: int, d_out: int = 1, probabilistic: bool = True, flipout: bool = True, weight_kwargs: dict = {}, bias_kwargs: dict = {}, name: str = 'Dense')[source]¶

Bases: probflow.modules.module.Module

Dense neural network layer.

TODO

Will not use flipout when n_mc>1

Note that this module uses the flipout estimator by default, but will not use the flipout estimator when we are taking multiple monte carlo samples per batch (when n_mc > 1). See Model.fit() for more info on setting the value of n_mc.

Parameters

d_in (int) – Number of input dimensions.
d_out (int) – Number of output dimensions (number of “units”).
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
flipout (bool) – Whether to use the flipout estimator for this layer. Default is True. Usually, when the global flipout setting is set to True, will use flipout during training but not during inference. If this kwarg is set to False, will not use flipout even during training.
weight_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the weight parameters. Default is an empty dict.
bias_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the bias parameters. Default is an empty dict.
name (str) – Name of this layer

add_kl_loss(loss, d2=None)¶: Add additional loss due to KL divergences.

bayesian_update()¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()¶: Serialize module object to bytes

kl_loss()¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules¶: A list of sub-Modules in this Module, including itself.

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

property parameters¶: A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()¶: Reset additional loss due to KL divergences

save(filename: str)¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

property trainable_variables¶: A list of trainable backend variables within this Module

class probflow.modules.DenseNetwork(d: List[int], activation: Callable = <function relu>, batch_norm: bool = False, batch_norm_loc: str = 'after', name: str = 'DenseNetwork', batch_norm_kwargs: dict = {}, **kwargs)[source]¶

Bases: probflow.modules.module.Module

A multilayer dense neural network

TODO: explain, math, diagram, examples, etc

Parameters

d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features).
activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
batch_norm (bool) – Whether or not to use batch normalization in between layers of the network. Default is False.
batch_norm_loc (str {'after' or 'before'}) – Where to apply the batch normalization. If 'after', applies the batch normalization after the activation. If 'before', applies the batch normalization before the activation. Default is 'after'.
batch_norm_kwargs (dict) – Additional parameters to pass to BatchNormalization for each layer.
kwargs – Additional parameters are passed to Dense for each layer.

layers¶

List of Dense neural network layers to be applied

Type: List[Dense]

activations¶

Activation function for each layer

Type: List[callable]

batch_norms¶

Batch normalization layers

Type: Union[None, List[BatchNormalization]]

add_kl_loss(loss, d2=None)¶: Add additional loss due to KL divergences.

bayesian_update()¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()¶: Serialize module object to bytes

kl_loss()¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules¶: A list of sub-Modules in this Module, including itself.

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

property parameters¶: A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()¶: Reset additional loss due to KL divergences

save(filename: str)¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

property trainable_variables¶: A list of trainable backend variables within this Module

class probflow.modules.Sequential(steps: List[Callable], name: str = 'Sequential')[source]¶

Bases: probflow.modules.module.Module

Apply a series of modules or functions sequentially.

TODO

Parameters

steps (list of Modules or callables) – Steps to apply
name (str) – Name of this module

add_kl_loss(loss, d2=None)¶: Add additional loss due to KL divergences.

bayesian_update()¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()¶: Serialize module object to bytes

kl_loss()¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules¶: A list of sub-Modules in this Module, including itself.

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

property parameters¶: A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()¶: Reset additional loss due to KL divergences

save(filename: str)¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

property trainable_variables¶: A list of trainable backend variables within this Module

class probflow.modules.BatchNormalization(shape: Union[int, List[int]], weight_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, bias_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, weight_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, bias_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, weight_initializer: Dict[str, Callable] = {'loc': <function xavier>}, bias_initializer: Dict[str, Callable] = {'loc': <function xavier>}, name='BatchNormalization')[source]¶

Bases: probflow.modules.module.Module

A layer which normalizes its inputs.

Batch normalization is a technique which normalizes, re-scales, and offsets the output of one layer before passing it on to another layer 1. It often leads to faster training of neural networks, and better generalization error by stabilizing the change in the layers’ input distributions, or perhaps by smoothing the optimization landscape 2.

Given a set of tensors for this batch, where \(x_{ij}\) is the \(i\)-th element of the \(j\)-th sample in this batch, this layer returns an elementwise transformation of the input tensors according to:

\[\text{BatchNorm}(x_{ij}) = \gamma_i \left( \frac{x_{ij} - \mu_i}{\sigma_i} \right) + \beta_i\]

Where \(\mu_i\) is the mean of the \(i\)-th element across the batch:

\[\mu_i = \frac{1}{N} \sum_{k=1}^{N} x_{ik}\]

and \(\sigma_i\) is the standard deviation of the \(i\)-th element across the batch:

\[\sigma_i = \frac{1}{N} \sum_{k=1}^{N} (x_{ik} - \mu_i)^2\]

and \(\gamma\) and \(\beta\) are two free parameters for each element.

Parameters

shape (int or list of int or ndarray) – Shape of the tensor to be batch-normalized.
name (str) – Name for this layer. Default = ‘BatchNormalization’
weight_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the weight parameter(s) (\(\gamma\)). Default = Deterministic
bias_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the bias parameter(s) (\(\beta\)). Default = Deterministic
weight_prior (None or a Distribution object) – Prior probability distribution for the weight parameter(s) (\(\gamma\)). None or a Distribution function which has been instantiated with parameters. Default = Normal (0,1)
bias_prior (None or a Distribution object) – Prior probability distribution for the bias parameter(s) (\(\beta\)). None or a Distribution function which has been instantiated with parameters. Default = Normal (0,1)
weight_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the weights (\(\gamma\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.
bias_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the biases (\(\beta\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.

Examples

Batch normalize the output of a Dense layer:

import probflow as pf

network = pf.Sequential([
    pf.Dense(d_in=7, d_out=100, bias=False),
    pf.BatchNormalization(100),
    tf.nn.relu,
    pf.Dense(d_in=100, d_out=1)
])
...

References

1: Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint, 2015. http://arxiv.org/abs/1502.03167
2: Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How Does Batch Normalization Help Optimization? arXiv preprint, 2018. http://arxiv.org/abs/1805.11604

add_kl_loss(loss, d2=None)¶: Add additional loss due to KL divergences.

bayesian_update()¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()¶: Serialize module object to bytes

kl_loss()¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules¶: A list of sub-Modules in this Module, including itself.

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

property parameters¶: A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()¶: Reset additional loss due to KL divergences

save(filename: str)¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

property trainable_variables¶: A list of trainable backend variables within this Module

class probflow.modules.Embedding(k: Union[int, List[int]], d: Union[int, List[int]], probabilistic: bool = False, name: str = 'Embedding', **kwargs)[source]¶

Bases: probflow.modules.module.Module

A categorical embedding layer.

Maps an input variable containing non-negative integers to dense vectors. The length of the vectors (the dimensionality of the embedding) can be set with the dims keyword argument. The embedding is learned over the course of training: if there are N unique integers in the input, and the embedding dimensionality is M, a matrix of NxM free parameters is created and optimized to minimize the loss.

By default, a Deterministic distribution is used for the embedding variables’ posterior distributions, with Normal (0, 1) priors. This corresponds to normal non-probabilistic embedding with L2 regularization.

The embeddings can be non-probabilistic (each integer corresponds to a single point in M-dimensional space, the default), or probabilistic (each integer corresponds to a M-dimensional multivariate distribution). Set the probabilistic kwarg to True to use probabilistic embeddings.

Parameters

k (int > 0 or List[int]) – Number of categories to embed.
d (int > 0 or List[int]) – Number of embedding dimensions.
posterior (Distribution class) – Probability distribution class to use to approximate the posterior. Default = Deterministic
prior (Distribution object) – Prior probability distribution which has been instantiated with parameters. Default = Normal (0,1)
initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution. Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If False (the default), will use Deterministic distributions for the variational posteriors. If True, will use Normal distributions.
name (str) – Name for this layer. Default = ‘Embeddings’
kwargs – Additional keyword arguments are passed to the Parameter constructor which creates the embedding variables.

Examples

Embed 10k word IDs into a 50-dimensional space:

emb = Embedding(k=10000, d=50)

ids = tf.random.uniform([1000000], minval=1, maxval=10000,
                        dtype=tf.dtypes.int64)

embeddings = emb(ids)

TODO: fuller example

add_kl_loss(loss, d2=None)¶: Add additional loss due to KL divergences.

bayesian_update()¶: Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()¶: Serialize module object to bytes

kl_loss()¶: Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()¶: Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules¶: A list of sub-Modules in this Module, including itself.

property n_parameters¶: Get the number of independent parameters of this module

property n_variables¶: Get the number of underlying variables in this module

property parameters¶: A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()¶: Reset additional loss due to KL divergences

save(filename: str)¶

Save module object to file

Parameters: filename (str) – Filename for file to which to save this object

property trainable_variables¶: A list of trainable backend variables within this Module