Modules¶
Modules are objects which take Tensor(s) as input, perform some computation on that Tensor, and output a Tensor. Modules can create and contain Parameters. For example, neural network layers are good examples of a Module, since they store parameters, and use those parameters to perform a computation (the forward pass of the data through the layer).
Module- abstract base class for all modulesDense- fully-connected neural network layerDenseNetwork- a multi-layer dense neural network moduleSequential- apply a list of modules sequentiallyBatchNormalization- normalize data per batchEmbedding- embed categorical data in a lower-dimensional space
- class probflow.modules.Module(*args)[source]¶
Bases:
probflow.utils.base.BaseModuleAbstract base class for Modules.
TODO
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- bayesian_update()[source]¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- kl_loss()[source]¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- class probflow.modules.Dense(d_in: int, d_out: int = 1, probabilistic: bool = True, flipout: bool = True, weight_kwargs: dict = {}, bias_kwargs: dict = {}, name: str = 'Dense')[source]¶
Bases:
probflow.modules.module.ModuleDense neural network layer.
TODO
Will not use flipout when n_mc>1
Note that this module uses the flipout estimator by default, but will not use the flipout estimator when we are taking multiple monte carlo samples per batch (when n_mc > 1). See
Model.fit()for more info on setting the value of n_mc.- Parameters
d_in (int) – Number of input dimensions.
d_out (int) – Number of output dimensions (number of “units”).
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
flipout (bool) – Whether to use the flipout estimator for this layer. Default is True. Usually, when the global flipout setting is set to True, will use flipout during training but not during inference. If this kwarg is set to False, will not use flipout even during training.
weight_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the weight parameters. Default is an empty dict.
bias_kwargs (dict) – Additional kwargs to pass to the Parameter constructor for the bias parameters. Default is an empty dict.
name (str) – Name of this layer
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.DenseNetwork(d: List[int], activation: Callable = <function relu>, batch_norm: bool = False, batch_norm_loc: str = 'after', name: str = 'DenseNetwork', batch_norm_kwargs: dict = {}, **kwargs)[source]¶
Bases:
probflow.modules.module.ModuleA multilayer dense neural network
TODO: explain, math, diagram, examples, etc
- Parameters
d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features).
activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)
probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If True (the default), will use Normal distributions for the variational posteriors. If False, will use Deterministic distributions.
batch_norm (bool) – Whether or not to use batch normalization in between layers of the network. Default is False.
batch_norm_loc (str {'after' or 'before'}) – Where to apply the batch normalization. If
'after', applies the batch normalization after the activation. If'before', applies the batch normalization before the activation. Default is'after'.batch_norm_kwargs (dict) – Additional parameters to pass to
BatchNormalizationfor each layer.kwargs – Additional parameters are passed to
Densefor each layer.
- activations¶
Activation function for each layer
- Type
List[callable]
- batch_norms¶
Batch normalization layers
- Type
Union[None, List[
BatchNormalization]]
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.Sequential(steps: List[Callable], name: str = 'Sequential')[source]¶
Bases:
probflow.modules.module.ModuleApply a series of modules or functions sequentially.
TODO
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.BatchNormalization(shape: Union[int, List[int]], weight_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, bias_posterior: Type[probflow.utils.base.BaseDistribution] = <class 'probflow.distributions.deterministic.Deterministic'>, weight_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, bias_prior: probflow.utils.base.BaseDistribution = <probflow.distributions.normal.Normal object>, weight_initializer: Dict[str, Callable] = {'loc': <function xavier>}, bias_initializer: Dict[str, Callable] = {'loc': <function xavier>}, name='BatchNormalization')[source]¶
Bases:
probflow.modules.module.ModuleA layer which normalizes its inputs.
Batch normalization is a technique which normalizes, re-scales, and offsets the output of one layer before passing it on to another layer 1. It often leads to faster training of neural networks, and better generalization error by stabilizing the change in the layers’ input distributions, or perhaps by smoothing the optimization landscape 2.
Given a set of tensors for this batch, where \(x_{ij}\) is the \(i\)-th element of the \(j\)-th sample in this batch, this layer returns an elementwise transformation of the input tensors according to:
\[\text{BatchNorm}(x_{ij}) = \gamma_i \left( \frac{x_{ij} - \mu_i}{\sigma_i} \right) + \beta_i\]Where \(\mu_i\) is the mean of the \(i\)-th element across the batch:
\[\mu_i = \frac{1}{N} \sum_{k=1}^{N} x_{ik}\]and \(\sigma_i\) is the standard deviation of the \(i\)-th element across the batch:
\[\sigma_i = \frac{1}{N} \sum_{k=1}^{N} (x_{ik} - \mu_i)^2\]and \(\gamma\) and \(\beta\) are two free parameters for each element.
- Parameters
shape (int or list of int or
ndarray) – Shape of the tensor to be batch-normalized.name (str) – Name for this layer. Default = ‘BatchNormalization’
weight_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the weight parameter(s) (\(\gamma\)). Default =
Deterministicbias_posterior (Distribution) – Probability distribution class to use to approximate the posterior for the bias parameter(s) (\(\beta\)). Default =
Deterministicweight_prior (
Noneor a Distribution object) – Prior probability distribution for the weight parameter(s) (\(\gamma\)).Noneor a Distribution function which has been instantiated with parameters. Default =Normal(0,1)bias_prior (
Noneor a Distribution object) – Prior probability distribution for the bias parameter(s) (\(\beta\)).Noneor a Distribution function which has been instantiated with parameters. Default =Normal(0,1)weight_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the weights (\(\gamma\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shapeas the single argument.bias_initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution for the biases (\(\beta\)). Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shapeas the single argument.
Examples
Batch normalize the output of a
Denselayer:import probflow as pf network = pf.Sequential([ pf.Dense(d_in=7, d_out=100, bias=False), pf.BatchNormalization(100), tf.nn.relu, pf.Dense(d_in=100, d_out=1) ]) ...
References
- 1
Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv preprint, 2015. http://arxiv.org/abs/1502.03167
- 2
Shibani Santurkar, Dimitris Tsipras, Andrew Ilyas, and Aleksander Madry. How Does Batch Normalization Help Optimization? arXiv preprint, 2018. http://arxiv.org/abs/1805.11604
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.modules.Embedding(k: Union[int, List[int]], d: Union[int, List[int]], probabilistic: bool = False, name: str = 'Embedding', **kwargs)[source]¶
Bases:
probflow.modules.module.ModuleA categorical embedding layer.
Maps an input variable containing non-negative integers to dense vectors. The length of the vectors (the dimensionality of the embedding) can be set with the
dimskeyword argument. The embedding is learned over the course of training: if there are N unique integers in the input, and the embedding dimensionality is M, a matrix of NxM free parameters is created and optimized to minimize the loss.By default, a
Deterministicdistribution is used for the embedding variables’ posterior distributions, withNormal(0, 1)priors. This corresponds to normal non-probabilistic embedding with L2 regularization.The embeddings can be non-probabilistic (each integer corresponds to a single point in M-dimensional space, the default), or probabilistic (each integer corresponds to a M-dimensional multivariate distribution). Set the probabilistic kwarg to True to use probabilistic embeddings.
- Parameters
k (int > 0 or List[int]) – Number of categories to embed.
d (int > 0 or List[int]) – Number of embedding dimensions.
posterior (Distribution class) – Probability distribution class to use to approximate the posterior. Default =
Deterministicprior (Distribution object) – Prior probability distribution which has been instantiated with parameters. Default =
Normal(0,1)initializer (dict of callables) – Initializer functions to use for each variable of the variational posterior distribution. Keys correspond to variable names (arguments to the distribution), and values contain functions to initialize those variables given
shapeas the single argument.probabilistic (bool) – Whether variational posteriors for the weights and biases should be probabilistic. If False (the default), will use
Deterministicdistributions for the variational posteriors. If True, will useNormaldistributions.name (str) – Name for this layer. Default = ‘Embeddings’
kwargs – Additional keyword arguments are passed to the
Parameterconstructor which creates the embedding variables.
Examples
Embed 10k word IDs into a 50-dimensional space:
emb = Embedding(k=10000, d=50) ids = tf.random.uniform([1000000], minval=1, maxval=10000, dtype=tf.dtypes.int64) embeddings = emb(ids)
TODO: fuller example
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences