Bayesian Correlation

Colab Badge

import numpy as np
import matplotlib.pyplot as plt
randn = lambda *x: np.random.randn(*x).astype('float32')

import probflow as pf

The Pearson correlation coefficient, \(\rho\), between two variables is the value that you need to multiply the individual variances of those variables to get the covariance between the two:

\[\rho_{x,y} = \frac{\text{cov} (x, y)}{\text{var}(x) ~ \text{var}(y)}\]

To model a correlation probabilistically, we can model our datapoints as being drawn from a multivariate normal distribution, with covariance matrix \(\Sigma\). The correlation coefficient is reflected in the off-diagonal elements of the correlation matrix (we’ll just assume the data is normalized, and so all the diagonal elements are 1).

../_images/covariance.svg

This is because (according to the definition of the correlation coefficient), the covariance matrix can be expressed using only the per-variable variances and the correlation coefficient:

\[\begin{split}\Sigma = \begin{bmatrix} \text{var}(x) & \text{cov}(x, y) \\ \text{cov}(x,y) & \text{var}(y)\\ \end{bmatrix} = \begin{bmatrix} \text{var}(x) & \rho ~ \text{var}(x) \text{var}(y) \\ \rho ~ \text{var}(x) \text{var}(y) & \text{var}(y)\\ \end{bmatrix}\end{split}\]

Or, if we assume the input has been standardized:

\[\begin{split}\Sigma = \begin{bmatrix} 1 & \rho \\ \rho & 1 \\ \end{bmatrix}\end{split}\]

So, basically what we’re doing is optimizing the correlation coefficient such that it gives us a Gaussian distribution which best fits the data points.

This is our first example of a generative model, where we’re not trying to predict \(y\) given \(x\). Instead, we’re interested in the data-generating distribution itself, from which the correlation coefficient can be derived. With this type of generative model, we’ll treat all the data as the dependent variable, and so our __call__ function will have no input. The only thing predicting the distribution of the data is the model and its parameters.

import tensorflow as tf

class BayesianCorrelation(pf.Model):

    def __init__(self):
        self.rho = pf.BoundedParameter(min=-1, max=1)

    def __call__(self):
        cov = tf.eye(2) + self.rho()*tf.abs(tf.eye(2)-1)
        return pf.MultivariateNormal(tf.zeros([2]), cov)

Then we can instantiate the model.

model = BayesianCorrelation()

Let’s generate some uncorrelated data.

X = np.random.randn(100, 2).astype('float32')
plt.plot(X[:, 0], X[:, 1], '.')
../_images/correlation1.svg

If we fit the model on some data which is uncorrelated, we can see that the posterior distribution for the correlation coefficient \(\rho\) is centered around 0:

model.fit(X, lr=0.1)
model.posterior_plot(ci=0.95, style='hist')
../_images/correlation2.svg

On the other hand, if we fit the model to some data which is highly correlated,

X[:, 1] = X[:, 0] + 0.2*np.random.randn(100).astype('float32')
plt.plot(X[:, 0], X[:, 1], '.')
../_images/correlation3.svg

Then the posterior distribution for the correlation coefficient \(\rho\) is considerably closer to 1:

model = BayesianCorrelation()
model.fit(X, lr=0.1)
model.posterior_plot(ci=0.95, style='hist')
../_images/correlation4.svg

Conversely, if we generate negatively correlated data,

X = np.random.randn(100, 2).astype('float32')
X[:, 1] = -X[:, 0] + 0.2*np.random.randn(100).astype('float32')
plt.plot(X[:, 0], X[:, 1], '.')
../_images/correlation5.svg

The model recovers the negative correlation coefficient:

model = BayesianCorrelation()
model.fit(X, lr=0.1)
model.posterior_plot(ci=0.95, style='hist')
../_images/correlation6.svg