Distributions¶

Probability Distributions describe the probability of either a Parameter or a datapoint taking a given value. In ProbFlow, they’re used mainly in three places: as parameters’ priors, as parameters’ variational distributions, and as observation distributions (the predicted distribution of the data which the model predicts). However, you can also use them as stand-alone objects or for other reasons within your models.

Creating a distribution object¶

To create a distribution, just create an instance of a Distribution

dist = pf.distributions.Normal(1, 2)

All ProbFlow distributions are also included in the main namespace, so you can just do:

dist = pf.Normal(1, 2)

Using a distribution as a prior¶

See Setting the Prior.

Using a distribution as a variational posterior¶

See Specifying the Variational Posterior.

Using a distribution as an observation distribution¶

See Specifying the observation distribution.

Getting the log probability of a value along a distribution¶

ProbFlow distribution objects can also be used in a stand-alone way, and they return values which are tensors of the backend type (e.g. if your backend is Tensorflow, they will return tf.Tensor objects, not numpy arrays).

To get the log probability of some value along a probability distribution, use the log_prob method:

dist = pf.Normal(3, 2)

x = np.linspace(-10, 10, 100)
log_p = dist.log_prob(x)
plt.plot(x, np.exp(log_p))

Getting the mean and mode of a distribution¶

To get the mean of a distribution, use the mean method:

>>> dist = pf.Gamma(4, 5)
>>> dist.mean()
<tf.Tensor: shape=(), dtype=float32, numpy=0.8>

And to get the mode, use the mode method:

>>> dist.mode()
<tf.Tensor: shape=(), dtype=float32, numpy=0.6>

Getting samples from a distribution¶

To draw random samples from a distribution, use the sample method:

>>> dist = pf.Normal(4, 5)
>>> dist.sample()
<tf.Tensor: shape=(), dtype=float32, numpy=5.124513>

You can take multiple samples from the same distribution using the n keyword argument:

>>> dist.sample(n=5)
<tf.Tensor: shape=(5,), dtype=float32, numpy=array([3.9323747, 2.1640768, 5.909429 , 7.7332597, 3.4620957], dtype=float32)>

If the shape of the distribution’s arguments have >1 dimension, the shape of the samples array will be (Nsamples, DistributionShape1, ..., DistributionShapeN):

>>> mean = np.random.randn(3, 4)
>>> std = np.exp(np.random.randn(3, 4))
>>> dist = pf.Normal(mean, std)
>>> dist.sample(n=5).shape
TensorShape([5, 3, 4])

Rolling your own distribution¶

ProbFlow includes most common Distributions but to create a custom distribution which uses a tensorflow_probability.distributions.Distribution or a torch.distributions.distribution just create a class which inherits from BaseDistribution, and implements the following methods:

__init__: should store references to the tensor(s) to be used as the distribution’s parameters
__call__: should return a backend distribution object (a tensorflow_probability.distributions.Distribution or a torch.distributions.distribution )

For example,

import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions

class NormalDistribution(pf.BaseDistribution):

    def __init__(self, mean, std):
        self.mean = mean
        self.std = std

    def __call__(self):
        return tfd.Normal(self.mean, self.std)

Or, to implement a probability distribution completely from scratch, create a class which inherits from BaseDistribution and implements the following methods:

__init__(*args): should store references to the tensor(s) to be used as the distribution’s parameters
log_prob(x): should return the log probability of some data (x) along this distribution
mean(): the mean of the distribution
mode(): the mode of the distribution
sample(n=1): should return n sample(s) from the distribution
(and you do not need to implement __call__ in this case)

For example, to manually implement a Normal distribution:

class NormalDistribution(pf.BaseDistribution):

    def __init__(self, mean, std):
        self.mean_tensor = mean
        self.std_tensor = std

    def log_prob(self, x):
        return (
            -0.5*tf.math.square((x-self.mean_tensor)/self.std_tensor)
            - tf.log(self.std*2.506628274631)
        )

    def mean(self):
        return self.mean_tensor

    def mode(self):
        return self.mean_tensor

    def sample(self, n=1):
        return tf.random.normal(shape=n, mean=self.mean_tensor, stddev=self.std_tensor)