Normalizing Flows¶
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
import tensorflow_probability as tfp
tfd = tfp.distributions
tfb = tfp.bijectors
import probflow as pf
TODO: description, math, diagram
Note
This example only works with the TensorFlow backend (using bijectors), but one could implement a similar model using PyTorch transforms.
Let’s create some data which has a wonky shape, and which would be difficult to model with a standard probability distribution.
N = 512
x = 3*np.random.randn(N, 2)
x[:, 0] = 0.25*np.square(x[:, 1])
x[:, 0] += np.random.randn(N)
plt.plot(x[:, 0], x[:, 1], '.')
To create the normalizing flow, we’ll first create a bijector to represent an invertible leaky rectified linear transformation.
class LeakyRelu(tfb.Bijector):
def __init__(self, alpha=0.5):
super().__init__(forward_min_event_ndims=0)
self.alpha = alpha
def _forward(self, x):
return tf.where(x>=0, x, x*self.alpha)
def _inverse(self, y):
return tf.where(y>=0, y, y/self.alpha)
def _inverse_log_det_jacobian(self, y):
return tf.math.log(tf.where(y>=0, 1., 1./self.alpha))
The source distribution will be a standard multivariate normal distribution,
and the affine transformations and “leakiness” of the rectified linear
transformations will be parameterized by DeterministicParameter
parameters, which are non-probabilistic (but still have priors).
class MlpNormalizingFlow(pf.Model):
def __init__(self, Nl, d):
self.base_dist = tfd.MultivariateNormalDiag([0., 0.])
self.V = [pf.DeterministicParameter([d, d]) for _ in range(Nl)]
self.s = [pf.DeterministicParameter([d]) for _ in range(Nl)]
self.L = [pf.DeterministicParameter([int(d*(d+1)/2)]) for _ in range(Nl)]
self.a = [pf.DeterministicParameter([1]) for _ in range(Nl-1)]
def __call__(self, n_steps=None):
n_steps = 2*len(self.V)-1 if n_steps is None else n_steps
bijectors = []
for i in range(len(self.V)):
bijectors += [tfb.Affine(
scale_tril=tfb.ScaleTriL().forward(self.L[i]()),
scale_perturb_factor=self.V[i](),
shift=self.s[i]())]
if i < len(self.V)-1:
bijectors += [LeakyRelu(alpha=tf.abs(self.a[i]()))]
return tfd.TransformedDistribution(
distribution=self.base_dist,
bijector=tfb.Chain(bijectors[:n_steps]))
Then we can create and fit the model to the data:
Nl = 8 #number of layers
model = MlpNormalizingFlow(Nl, 2)
model.fit(x, epochs=1e4, lr=0.02)
Comparing the data points to samples from the base distribution transformed by the normalizing flow, we can see that the transformations have warped the base distribution to match the distribution of the data.
# Plot original points and samples from the model
plt.plot(x[:, 0], x[:, 1], '.')
S = model().sample((512,))
plt.plot(S[:, 0], S[:, 1], '.')
And because normalizing flows are made up of invertible transformations, we can evaluate the probability of the distribution at any arbitrary point on a grid and compare that transformed probability distribution to the original data:
# Compute probabilities
res = 200
xx, yy = np.meshgrid(np.linspace(-5, 20, res), np.linspace(-10, 10, res))
xy = np.hstack([e.reshape(-1, 1) for e in [xx, yy]])
probs = model().prob(xy).numpy()
# Plot them
plt.imshow(probs.reshape((res, res)),
origin='lower', extent=(-5, 20, -10, 10))
plt.plot(x[:, 0], x[:, 1], 'r,')
And we can use the n_steps
kwarg to view samples after each successive
transformation in the flow. We start with the base distribution (a standard
multivariate Gaussian), and the flow stepwise distorts and transforms that
distribution until it approximates the data.
for i in range(2*Nl):
plt.subplot(4, 4, i+1)
S = model(n_steps=i).sample((512,))
plt.plot(S[:, 0], S[:, 1], '.')
Also see:
Eric Jang’s great tutorial on normalizing flows, which this example is based on.
Danilo Jimenez Rezende & Shakir Mohamed. Variational Inference with Normalizing Flows. PLMR, 2015.