Robust Heteroscedastic Regression Colab Badge

TLDR

class RobustHeteroscedasticRegression(pf.ContinuousModel):

    def __init__(self, dims):
        self.m = pf.Dense(dims)
        self.s = pf.Dense(dims)

    def __call__(self, x):
        return pf.Cauchy(self.m(x), tf.exp(self.s(x)))

model = RobustHeteroscedasticRegression(x.shape[1])
model.fit(x, y)

With a linear regression, the amount of noise is usually assumed to be constant. But a lot of real data is heteroscedastic - that is, the amount of noise changes as a function of the independent variable(s). Also, most real data isn’t perfectly normally distributed, and has some amount of outliers.

Let’s generate some data which is heteroscedastic and also contains outliers.

# Imports
import probflow as pf
import numpy as np
import matplotlib.pyplot as plt
rand = lambda *x: np.random.rand(*x).astype('float32')
randn = lambda *x: np.random.randn(*x).astype('float32')

# Settings
N = 512 #number of datapoints
D = 1   #number of features

# Heteroscedastic data
x = 2*rand(N, D)-1
y = x*2. - 1. + np.exp(x)*randn(N, 1)

# Add some outliers
y[x<-0.95] += 10

# Plot it
plt.plot(x, y, '.')
../_images/output_4_03.svg

The problem with a regular Linear Regression

Let’s try fitting a regular linear regression to this data.

model = pf.LinearRegression(D)
model.fit(x, y, lr=0.02, epochs=500)

The first problem is that the regular linear regression systematically overestimates the target value for data points which have feature values similar to the outliers (around \(x \approx -1\)).

# Make predictions
x_test = np.linspace(-1, 1, 100).astype('float32').reshape((100, 1))
preds = model.predict(x_test)

# Plot em
plt.plot(x, y, '.')
plt.plot(x_test, preds, 'r')
../_images/output_8_11.svg

Another problem is that the regular linear regression assumes the variance is constant over different feature values, and so the model can’t capture the changing variance which is obviously present in the data. It systematically overestimates the variance for datapoints with low \(x\) values, and underestimates the variance for datapoints with high \(x\) values:

# Compute 95% predictive confidence intervals
lb, ub = model.predictive_interval(x_test, ci=0.9)

# Plot em
plt.fill_between(x_test[:, 0], lb[:, 0], ub[:, 0], alpha=0.2)
plt.plot(x, y, '.')
../_images/output_10_03.svg

Robust Heteroscedastic Regression

To allow the model to capture this change in variance, we’ll add to the model weights which use the features to predict the variance. Just like a regular linear regression, we’ll use the features to predict the mean:

\[\mu = \mathbf{x}^\top \mathbf{w}_\mu + b_\mu\]

But unlike a regular linear regression, we’ll use additional weights to predict the scale of the distribution (after passing it through an exponential function, because the scale must be positive):

\[\gamma = \exp ( \mathbf{x}^\top \mathbf{w}_\gamma + b_\gamma )\]

Also, to perform a regression which is robust to outliers, we’ll use a Cauchy distribution as the observation distribution, which has heavier tails than a Normal distribution (and so is less affected by outlier values).

\[y \sim \text{Cauchy}(\mu, ~ \gamma)\]

Let’s create this model with ProbFlow:

import tensorflow as tf

class RobustHeteroscedasticRegression(pf.ContinuousModel):

    def __init__(self, dims):
        self.wm = pf.Parameter([dims, 1], name='Mean weights')
        self.bm = pf.Parameter([1, 1], name='Mean bias')
        self.ws = pf.Parameter([dims, 1], name='Scale weights')
        self.bs = pf.Parameter([1, 1], name='Scale bias')

    def __call__(self, x):
        means = x @ self.wm() + self.bm()
        stds = tf.exp(x @ self.ws() + self.bs())
        return pf.Cauchy(means, stds)

And fit it to the data:

model = RobustHeteroscedasticRegression(D)
model.fit(x, y, lr=0.02, epochs=500)

Taking a look at the parameter posteriors, we can see that the model has recovered the parameter values we used to generate the data:

model.posterior_plot(ci=0.9, cols=2)
../_images/output_16_01.svg

More importantly, the robust heteroscedastic model neither under- or overestimates the target values:

# Make predictions
preds = model.predict(x_test)

# Plot em
plt.plot(x, y, '.')
plt.plot(x_test, preds, 'r')
../_images/output_18_01.svg

And it is able to capture the changing variance as a function of the feature(s):

# Compute 95% predictive confidence intervals
lb, ub = model.predictive_interval(x_test, ci=0.9)

# Plot em
plt.fill_between(x_test[:, 0], lb[:, 0], ub[:, 0], alpha=0.2)
plt.plot(x, y, '.')
../_images/output_20_01.svg

Using a Student’s t-distribution

Note that with a robust regression, you could also use a Student’s t-distribution and estimate the degrees of freedom as well (to model how heavy the tails of the distribution are):

class RobustHeteroscedasticRegression(pf.ContinuousModel):

    def __init__(self, dims):
        self.mw = pf.Parameter([dims, 1], name='Mean weight')
        self.mb = pf.Parameter([1, 1], name='Mean bias')
        self.sw = pf.Parameter([dims, 1], name='Scale weight')
        self.sb = pf.Parameter([1, 1], name='Scale bias')
        self.df = pf.ScaleParameter(name='Degrees of freedom')

    def __call__(self, x):
        means = x @ self.wm() + self.bm()
        stds = tf.exp(x @ self.ws() + self.bs())
        return pf.StudentT(self.df(), means, stds)