Modules¶
Modules are objects which take Tensor(s) as input, perform some computation on that Tensor, and output a Tensor. Modules can create and contain Parameters. For example, neural network layers are good examples of a Module, since they store parameters, and use those parameters to perform a computation (the forward pass of the data through the layer).
Module
- abstract base class for all modulesDense
- fully-connected neural network layerDenseNetwork
- a multi-layer dense neural network moduleSequential
- apply a list of modules sequentiallyBatchNormalization
- normalize data per batchEmbedding
- embed categorical data in a lower-dimensional space
- class probflow.modules.Module(*args)[source]¶
Bases:
probflow.utils.base.BaseModule
Abstract base class for Modules.
TODO
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- bayesian_update()[source]¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- kl_loss()[source]¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- class probflow.modules.Dense(d_in: int, d_out: int = 1, probabilistic: bool = True, flipout: bool = True, weight_kwargs: dict = {}, bias_kwargs: dict = {}, name: str = 'Dense')[source]¶
Bases:
probflow.modules.module.Module
Dense neural network layer.
TODO
Will not use flipout when n_mc>1
Note that this module uses the flipout estimator by default, but will not use the flipout estimator when we are taking multiple monte carlo samples per batch (when n_mc > 1). See
Model.fit()
for more info on setting the value of n_mc.- Parameters
d_in (int) – Number of input dimensions.
d_out (int) – Number of output dimensions (number of “units”).
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
flipout (bool) – Whether to use the flipout estimator for this layer. Default is True. Usually, when the global flipout setting is set to True, will use flipout during training but not during inference. If this kwarg is set to False, will not use flipout even during training.
weight_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the weight parameters. Default is an empty dict.
bias_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the bias parameters. Default is an empty dict.
name (str) – Name of this layer
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.DenseNetwork(d: List[int], activation: Callable = <function relu>, batch_norm: bool = False, batch_norm_loc: str = 'after', name: str = 'DenseNetwork', batch_norm_kwargs: dict = {}, **kwargs)[source]¶
Bases:
probflow.modules.module.Module
A multilayer dense neural network
TODO: explain, math, diagram, examples, etc
- Parameters
d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features).
activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
batch_norm (bool) – Whether or not to use batch normalization in between layers of the network. Default is False.
batch_norm_loc (str {'after' or 'before'}) – Where to apply the batch normalization. If
'after'
, applies the batch normalization after the activation. If'before'
, applies the batch normalization before the activation. Default is'after'
.batch_norm_kwargs (dict) – Additional parameters to pass to
BatchNormalization
for each layer.kwargs – Additional parameters are passed to
Dense
for each layer.
- activations¶
Activation function for each layer
- Type
List[callable]
- batch_norms¶
Batch normalization layers
- Type
Union[None, List[
BatchNormalization
]]
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.Sequential(steps: List[Callable], name: str = 'Sequential')[source]¶
Bases:
probflow.modules.module.Module
Apply a series of modules or functions sequentially.
TODO
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.BatchNormalization(shape: Union[int, List[int]], weight_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, bias_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, weight_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, bias_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, weight_initializer: Dict[str, Callable] = {'loc': <function xavier>}, bias_initializer: Dict[str, Callable] = {'loc': <function xavier>}, name='BatchNormalization')[source]¶
Bases:
probflow.modules.module.Module
A layer which normalizes its inputs.
Batch normalization is a technique which normalizes, re-scales, and offsets the output of one layer before passing it on to another layer 1. It often leads to faster training of neural networks, and better generalization error by stabilizing the change in the layers’ input distributions, or perhaps by smoothing the optimization landscape 2.
Given a set of tensors for this batch, where \(x_{ij}\) is the \(i\)-th element of the \(j\)-th sample in this batch, this layer returns an elementwise transformation of the input tensors according to:
\[\text{BatchNorm}(x_{ij}) = \gamma_i \left( \frac{x_{ij} - \mu_i}{\sigma_i} \right) + \beta_i\]Where \(\mu_i\) is the mean of the \(i\)-th element across the batch:
\[\mu_i = \frac{1}{N} \sum_{k=1}^{N} x_{ik}\]and \(\sigma_i\) is the standard deviation of the \(i\)-th element across the batch:
\[\sigma_i = \frac{1}{N} \sum_{k=1}^{N} (x_{ik} - \mu_i)^2\]and \(\gamma\) and \(\beta\) are two free parameters for each element.
- Parameters
shape (int or list of int or
ndarray
) – Shape of the tensor to be batch-normalized.name (str) – Name for this layer. Default = ‘BatchNormalization’
weight_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the weight parameter(s) (\(\gamma\)). Default =
Deterministic
bias_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the bias parameter(s) (\(\beta\)). Default =
Deterministic
weight_prior (
None
or a Distribution object) – Prior probability distribution for the weight parameter(s) (\(\gamma\)).None
or a Distribution function which has been instantiated with parameters. Default =Normal
(0,1)
bias_prior (
None
or a Distribution object) – Prior probability distribution for the bias parameter(s) (\(\beta\)).None
or a Distribution function which has been instantiated with parameters. Default =Normal
(0,1)
weight_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the weights (\(\gamma\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shape
as the single argument.bias_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the biases (\(\beta\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shape
as the single argument.
Examples
Batch normalize the output of a
Dense
layer:import probflow as pf network = pf.Sequential([ pf.Dense(d_in=7, d_out=100, bias=False), pf.BatchNormalization(100), tf.nn.relu, pf.Dense(d_in=100, d_out=1) ]) ...
References
- 1
Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint, 2015. http://arxiv.org/abs/1502.03167
- 2
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How Does Batch Normalization Help Optimization? arXiv preprint, 2018. http://arxiv.org/abs/1805.11604
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.Embedding(k: Union[int, List[int]], d: Union[int, List[int]], probabilistic: bool = False, name: str = 'Embedding', **kwargs)[source]¶
Bases:
probflow.modules.module.Module
A categorical embedding layer.
Maps an input variable containing non-negative integers to dense vectors. The length of the vectors (the dimensionality of the embedding) can be set with the
dims
keyword argument. The embedding is learned over the course of training: if there are N unique integers in the input, and the embedding dimensionality is M, a matrix of NxM free parameters is created and optimized to minimize the loss.By default, a
Deterministic
distribution is used for the embedding variables’ posterior distributions, withNormal
(0, 1)
priors. This corresponds to normal non-probabilistic embedding with L2 regularization.The embeddings can be non-probabilistic (each integer corresponds to a single point in M-dimensional space, the default), or probabilistic (each integer corresponds to a M-dimensional multivariate distribution). Set the probabilistic kwarg to True to use probabilistic embeddings.
- Parameters
k (int > 0 or List[int]) – Number of categories to embed.
d (int > 0 or List[int]) – Number of embedding dimensions.
posterior (Distribution class) – Probability distribution class to use to approximate the posterior. Default =
Deterministic
prior (Distribution object) – Prior probability distribution which has been instantiated with parameters. Default =
Normal
(0,1)
initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution. Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shape
as the single argument.probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If False (the default), will use
Deterministic
distributions for the variational posteriors. If True, will useNormal
distributions.name (str) – Name for this layer. Default = ‘Embeddings’
kwargs – Additional keyword arguments are passed to the
Parameter
constructor which creates the embedding variables.
Examples
Embed 10k word IDs into a 50-dimensional space:
emb = Embedding(k=10000, d=50) ids = tf.random.uniform([1000000], minval=1, maxval=10000, dtype=tf.dtypes.int64) embeddings = emb(ids)
TODO: fuller example
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences