Modules

Modules are objects which take Tensor(s) as input, perform some computation on that Tensor, and output a Tensor. Modules can create and contain Parameters. For example, neural network layers are good examples of a Module, since they store parameters, and use those parameters to perform a computation (the forward pass of the data through the layer).

  • Module - abstract base class for all modules

  • Dense - fully-connected neural network layer

  • DenseNetwork - a multi-layer dense neural network module

  • Sequential - apply a list of modules sequentially

  • BatchNormalization - normalize data per batch

  • Embedding - embed categorical data in a lower-dimensional space


class probflow.modules.Module(*args)[source]

Bases: probflow.utils.base.BaseModule

Abstract base class for Modules.

TODO

property parameters

A list of Parameters in this Module and its sub-Modules.

property modules

A list of sub-Modules in this Module, including itself.

property trainable_variables

A list of trainable backend variables within this Module

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

bayesian_update()[source]

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

kl_loss()[source]

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()[source]

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

reset_kl_loss()[source]

Reset additional loss due to KL divergences

add_kl_loss(loss, d2=None)[source]

Add additional loss due to KL divergences.

dumps()[source]

Serialize module object to bytes

save(filename: str)[source]

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

class probflow.modules.Dense(d_in: int, d_out: int = 1, probabilistic: bool = True, flipout: bool = True, weight_kwargs: dict = {}, bias_kwargs: dict = {}, name: str = 'Dense')[source]

Bases: probflow.modules.module.Module

Dense neural network layer.

TODO

Will not use flipout when n_mc>1

Note that this module uses the flipout estimator by default, but will not use the flipout estimator when we are taking multiple monte carlo samples per batch (when n_mc > 1). See Model.fit() for more info on setting the value of n_mc.

Parameters
  • d_in (int) – Number of input dimensions.

  • d_out (int) – Number of output dimensions (number of “units”).

  • probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.

  • flipout (bool) – Whether to use the flipout estimator for this layer. Default is True. Usually, when the global flipout setting is set to True, will use flipout during training but not during inference. If this kwarg is set to False, will not use flipout even during training.

  • weight_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the weight parameters. Default is an empty dict.

  • bias_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the bias parameters. Default is an empty dict.

  • name (str) – Name of this layer

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()

Serialize module object to bytes

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

property trainable_variables

A list of trainable backend variables within this Module

class probflow.modules.DenseNetwork(d: List[int], activation: Callable = <function relu>, batch_norm: bool = False, batch_norm_loc: str = 'after', name: str = 'DenseNetwork', batch_norm_kwargs: dict = {}, **kwargs)[source]

Bases: probflow.modules.module.Module

A multilayer dense neural network

TODO: explain, math, diagram, examples, etc

Parameters
  • d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features).

  • activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)

  • probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.

  • batch_norm (bool) – Whether or not to use batch normalization in between layers of the network. Default is False.

  • batch_norm_loc (str {'after' or 'before'}) – Where to apply the batch normalization. If 'after', applies the batch normalization after the activation. If 'before', applies the batch normalization before the activation. Default is 'after'.

  • batch_norm_kwargs (dict) – Additional parameters to pass to BatchNormalization for each layer.

  • kwargs – Additional parameters are passed to Dense for each layer.

layers

List of Dense neural network layers to be applied

Type

List[Dense]

activations

Activation function for each layer

Type

List[callable]

batch_norms

Batch normalization layers

Type

Union[None, List[BatchNormalization]]

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()

Serialize module object to bytes

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

property trainable_variables

A list of trainable backend variables within this Module

class probflow.modules.Sequential(steps: List[Callable], name: str = 'Sequential')[source]

Bases: probflow.modules.module.Module

Apply a series of modules or functions sequentially.

TODO

Parameters
  • steps (list of Modules or callables) – Steps to apply

  • name (str) – Name of this module

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()

Serialize module object to bytes

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

property trainable_variables

A list of trainable backend variables within this Module

class probflow.modules.BatchNormalization(shape: Union[int, List[int]], weight_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, bias_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, weight_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, bias_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, weight_initializer: Dict[str, Callable] = {'loc': <function xavier>}, bias_initializer: Dict[str, Callable] = {'loc': <function xavier>}, name='BatchNormalization')[source]

Bases: probflow.modules.module.Module

A layer which normalizes its inputs.

Batch normalization is a technique which normalizes, re-scales, and offsets the output of one layer before passing it on to another layer 1. It often leads to faster training of neural networks, and better generalization error by stabilizing the change in the layers’ input distributions, or perhaps by smoothing the optimization landscape 2.

Given a set of tensors for this batch, where \(x_{ij}\) is the \(i\)-th element of the \(j\)-th sample in this batch, this layer returns an elementwise transformation of the input tensors according to:

\[\text{BatchNorm}(x_{ij}) = \gamma_i \left( \frac{x_{ij} - \mu_i}{\sigma_i} \right) + \beta_i\]

Where \(\mu_i\) is the mean of the \(i\)-th element across the batch:

\[\mu_i = \frac{1}{N} \sum_{k=1}^{N} x_{ik}\]

and \(\sigma_i\) is the standard deviation of the \(i\)-th element across the batch:

\[\sigma_i = \frac{1}{N} \sum_{k=1}^{N} (x_{ik} - \mu_i)^2\]

and \(\gamma\) and \(\beta\) are two free parameters for each element.

Parameters
  • shape (int or list of int or ndarray) – Shape of the tensor to be batch-normalized.

  • name (str) – Name for this layer. Default = ‘BatchNormalization’

  • weight_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the weight parameter(s) (\(\gamma\)). Default = Deterministic

  • bias_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the bias parameter(s) (\(\beta\)). Default = Deterministic

  • weight_prior (None or a Distribution object) – Prior probability distribution for the weight parameter(s) (\(\gamma\)). None or a Distribution function which has been instantiated with parameters. Default = Normal (0,1)

  • bias_prior (None or a Distribution object) – Prior probability distribution for the bias parameter(s) (\(\beta\)). None or a Distribution function which has been instantiated with parameters. Default = Normal (0,1)

  • weight_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the weights (\(\gamma\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.

  • bias_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the biases (\(\beta\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.

Examples

Batch normalize the output of a Dense layer:

import probflow as pf

network = pf.Sequential([
    pf.Dense(d_in=7, d_out=100, bias=False),
    pf.BatchNormalization(100),
    tf.nn.relu,
    pf.Dense(d_in=100, d_out=1)
])
...

References

1

Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint, 2015. http://arxiv.org/abs/1502.03167

2

Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How Does Batch Normalization Help Optimization? arXiv preprint, 2018. http://arxiv.org/abs/1805.11604

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()

Serialize module object to bytes

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

property trainable_variables

A list of trainable backend variables within this Module

class probflow.modules.Embedding(k: Union[int, List[int]], d: Union[int, List[int]], probabilistic: bool = False, name: str = 'Embedding', **kwargs)[source]

Bases: probflow.modules.module.Module

A categorical embedding layer.

Maps an input variable containing non-negative integers to dense vectors. The length of the vectors (the dimensionality of the embedding) can be set with the dims keyword argument. The embedding is learned over the course of training: if there are N unique integers in the input, and the embedding dimensionality is M, a matrix of NxM free parameters is created and optimized to minimize the loss.

By default, a Deterministic distribution is used for the embedding variables’ posterior distributions, with Normal (0, 1) priors. This corresponds to normal non-probabilistic embedding with L2 regularization.

The embeddings can be non-probabilistic (each integer corresponds to a single point in M-dimensional space, the default), or probabilistic (each integer corresponds to a M-dimensional multivariate distribution). Set the probabilistic kwarg to True to use probabilistic embeddings.

Parameters
  • k (int > 0 or List[int]) – Number of categories to embed.

  • d (int > 0 or List[int]) – Number of embedding dimensions.

  • posterior (Distribution class) – Probability distribution class to use to approximate the posterior. Default = Deterministic

  • prior (Distribution object) – Prior probability distribution which has been instantiated with parameters. Default = Normal (0,1)

  • initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution. Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given shape as the single argument.

  • probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If False (the default), will use Deterministic distributions for the variational posteriors. If True, will use Normal distributions.

  • name (str) – Name for this layer. Default = ‘Embeddings’

  • kwargs – Additional keyword arguments are passed to the Parameter constructor which creates the embedding variables.

Examples

Embed 10k word IDs into a 50-dimensional space:

emb = Embedding(k=10000, d=50)

ids = tf.random.uniform([1000000], minval=1, maxval=10000,
                        dtype=tf.dtypes.int64)

embeddings = emb(ids)

TODO: fuller example

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

dumps()

Serialize module object to bytes

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

property trainable_variables

A list of trainable backend variables within this Module