Applications

The applications module contains pre-built Models


class probflow.applications.LinearRegression(d: int, d_o: int = 1, heteroscedastic: bool = False)[source]

Bases: probflow.models.continuous_model.ContinuousModel

A multiple linear regression

TODO: explain, math, diagram, examples, etc

Parameters
  • d (int) – Dimensionality of the independent variable (number of features)

  • d_o (int) – Dimensionality of the dependent variable (number of target dimensions)

  • heteroscedastic (bool) – Whether to model a change in noise as a function of \(\mathbf{x}\) (if heteroscedastic=True), or not (if heteroscedastic=False, the default).

weights

Regression weights

Type

Parameter

bias

Regression intercept

Type

Parameter

std

Standard deviation of the Normal observation distribution

Type

ScaleParameter

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

aleatoric_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only aleatoric uncertainty (uncertainty due to noise).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the aleatoric predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

aleatoric_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)

Return type

ndarray

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

calibration_curve(x, y, n=1000, resolution=100, batch_size=None)

Compute the regression calibration curve (Kuleshov et al., 2018).

The regression calibration curve compares the empirical cumulative probability to the cumulative probability predicted by a regression model (Kuleshov et al., 2018). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

The calibration curve then plots \(p\) against \(\hat{p}\).

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • p (|ndarray|) – The predicted cumulative frequencies, \(p\).

  • p_hat (|ndarray|) – The empirical cumulative frequencies, \(\hat{p}\).

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the calibration curve with calibration_curve():

p_pred, p_empirical = model.calibration_curve(x_val, y_val)

The returned values can be used directly or plotted against one another to get the calibration curve (as in Figure 3 in Kuleshov et al., 2018)

import matplotlib.pyplot as plt
plt.plot(p_pred, p_empirical)

Or, even more simply, just use calibration_curve_plot().

See also

References

calibration_curve_plot(x, y, n=1000, resolution=100, batch_size=None, **kwargs)

Plot the regression calibration curve.

See calibration_curve() for more info about the regression calibration curve.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

See also

calibration_metric(metric, x, y=None, n=1000, resolution=100, batch_size=None)

Compute one or more of several calibration metrics

Regression calibration metrics measure the error between a model’s regression calibration curve and the ideal calibration curve - i.e., what the curve would be if the model were perfectly calibrated (see Kuleshov et al., 2018 and Chung et al., 2020). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

Various metrics can be computed from these curves to measure how accurately the regression model captures uncertainty:

The mean squared calibration error (MSCE) is the mean squared error between the empirical and predicted frequencies,

\[MSCE = \frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2\]

The root mean squared calibration error (RMSCE) is just the square root of the MSCE:

\[RMSCE = \sqrt{\frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2}\]

The mean absolute calibration error (MACE) is the mean of the absolute differences between the empirical and predicted frequencies:

\[MACE = \frac{1}{m} \sum_{j=1}^m | p_j - \hat{p}_j |\]

And the miscalibration area (MA) is the area between the calibration curve and the ideal calibration curve (the identity line from (0, 0) to (1, 1):

\[MA = \int_0^1 p_x - \hat{p}_x dx\]

Note that MA is equal to MACE as the number of bins (set by the resolution keyword argument) goes to infinity.

To choose which metric to compute, pass the name of the metric (msce, rmsce, mace, or ma) as the first argument to this function (or a list of them to compute multiple).

See Kuleshov et al., 2018, Chung et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using calibration metrics, among other metrics. Note that calibration is generally less important than accuracy, but more important than other metrics like sharpness() and any dispersion_metric().

Parameters
  • metric (str {'msce', 'rmsce', 'mace', or 'ma'} or List[str]) –

    Which metric(s) to compute (see above for the definition of each metric). To compute multiple metrics, pass a list of the metric names you’d like to compute. Available metrics are:

    • msce: mean squared calibration error

    • rmsce: root mean squared calibration error

    • mace: mean absolute calibration error

    • ma: miscalibration area

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested calibration metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute different calibration metrics using expected_calibration_error(). For example, to compute the mean squared calibration error (MSCE):

>>> model.calibration_metric("msce", x_val, y_val)
0.123

Or, to compute the mean absolute calibration error (MACE):

>>> model.calibration_metric("mace", x_val, y_val)
0.211

To compute multiple metrics at the same time, pass a list of metric names:

>>> model.calibration_metric(["msce", "mace"], x_val, y_val)
{"msce": 0.123, "mace": 0.211}

References

coverage_by(x_by, x, y=None, n: int = 1000, ci: float = 0.95, bins: int = 30, plot: bool = True, ideal_line_kwargs: dict = {}, batch_size=None, **kwargs)

Compute and plot the coverage of a given confidence interval of the posterior predictive distribution as a function of specified independent variables.

TODO: Docs…

Parameters
  • x_by (int or str or list of int or list of str) – Which independent variable(s) to plot the log probability as a function of. That is, which columns in x to plot by.

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • ci (float between 0 and 1) – Inner percentile to find the coverage of. For example, if ci=0.95, will compute the coverage of the inner 95% of the posterior predictive distribution.

  • bins (int) – Number of bins to use for x_by

  • ideal_line_kwargs (dict) – Dict of args to pass to matplotlib.pyplot.plot for ideal coverage line.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_by

Returns

  • xo (|ndarray|) – Values of x_by corresponding to bin centers.

  • co (|ndarray|) – Coverage of the ci confidence interval of the predictive distribution in each bin.

dispersion_metric(metric, x, n=1000, batch_size=None)

Compute one or more of several calibration metrics

Dispersion metrics measure how much a model’s uncertainty estimates vary. There are several different dispersion metrics:

The coefficient of variation (\(C_v\)) is the ratio of the standard deviation to the mean (of the model’s uncertainty standard deviations):

\[C_v =\]

The quartile coefficient of dispersion (\(QCD\)) is less sensitive to outliers, as it simply measures the difference between the first and third quartile (of the model’s uncertainty standard deviations) to their sum:

\[QCD = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using dispersion metrics, among other metrics. Note that dispersion metrics should generally be one of the last things you consider - accuracy, calibration, and sharpness usually being more important.

Parameters
  • metric (str {'cv' or 'qcd'} or List[str]) – Dispersion metric to compute. Or,

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested dispersion metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the coefficient of variation of our model’s predictions with:

>>> model.dispersion_metric('cv', x_val)
0.732

Or the quartile coefficient of dispersion with:

>>> model.dispersion_metric('qcd', x_val)
0.625

References

dumps()

Serialize module object to bytes

elbo_loss(x_data, y_data, n: int, n_mc: int)

Compute the negative ELBO, scaled to a single sample.

Parameters
  • x_data – The independent variable values (or None if this is a generative model)

  • y_data – The dependent variable values

  • n (int) – Total number of datapoints in the dataset

  • n_mc (int) – Number of MC samples we’re taking from the posteriors

epistemic_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the epistemic predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

epistemic_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)

Fit the model to data

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Default = 128

  • epochs (int) – Number of epochs to train the model. Default = 200

  • shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if x is a DataGenerator Default = True

  • optimizer (None or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam). When the backend is PyTorch the default is to use TODO

  • optimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.

  • lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (batch_size).

  • flipout (bool) – Whether to use flipout during training where possible Default = True

  • num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If None, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if a DataGenerator is passed as x. Default = None

  • callbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is [], i.e. no callbacks.

  • eager (bool) – Whether to use eager execution. If False, will use tf.function (for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = False

  • n_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.

Example

See the user guide section on Fitting a Model.

get_elbo()

Get the current ELBO on training data

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

log_likelihood(x_data, y_data)

Compute the sum log likelihood of the model given a batch of data

log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)

Compute the log probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns log probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns sum of all log probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns log probability posterior distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

log_probs – Log probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

metric(metric, x, y=None, batch_size=None)

Compute a metric of model performance

TODO: docs

TODO: note that this doesn’t work w/ generative models

Parameters
  • metric (str or callable) –

    Metric to evaluate. Available metrics:

    • ’lp’: log likelihood sum

    • ’log_prob’: log likelihood sum

    • ’accuracy’: accuracy

    • ’acc’: accuracy

    • ’mean_squared_error’: mean squared error

    • ’mse’: mean squared error

    • ’sum_squared_error’: sum squared error

    • ’sse’: sum squared error

    • ’mean_absolute_error’: mean absolute error

    • ’mae’: mean absolute error

    • ’r_squared’: coefficient of determination

    • ’r2’: coefficient of determination

    • ’recall’: true positive rate

    • ’sensitivity’: true positive rate

    • ’true_positive_rate’: true positive rate

    • ’tpr’: true positive rate

    • ’specificity’: true negative rate

    • ’selectivity’: true negative rate

    • ’true_negative_rate’: true negative rate

    • ’tnr’: true negative rate

    • ’precision’: precision

    • ’f1_score’: F-measure

    • ’f1’: F-measure

    • callable: a function which takes (y_true, y_pred)

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator to generate both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

posterior_ci(params=None, ci=0.95, n=10000)

Posterior confidence intervals

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.

  • ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95

  • n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000

Returns

Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str

Return type

dict

posterior_mean(params=None)

Get the mean of the posterior distribution(s)

TODO: Docs… params is a list of strings of params to plot

Parameters

params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.

Returns

Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior means. The ndarrays are the same size as each parameter. Or just the ndarray if params was a str.

Return type

dict

posterior_plot(params=None, cols=1, **kwargs)

Plot posterior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.posterior_plot()

posterior_sample(params=None, n=10000)

Draw samples from parameter posteriors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get a sample for all parameters in the model.

  • num_samples (int) – Number of samples to take from each posterior distribution. Default = 1000

Returns

Samples from the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior samples. The ndarrays are of size (num_samples, param.shape). Or just the ndarray if params was a str.

Return type

dict

pred_dist_coverage(x, y=None, n=1000, ci=0.95, batch_size=None)

Compute what percent of samples are covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prc_covered – Proportion of the samples which were covered by the predictive distribution’s confidence interval.

Return type

float between 0 and 1

pred_dist_covered(x, y=None, n: int = 1000, ci: float = 0.95, batch_size=None)

Compute whether each observation was covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

pred_dist_plot(x, n=10000, cols=1, individually=False, batch_size=None, **kwargs)

Plot posterior predictive distribution from the model given x.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model given x. Default = 10000

  • cols (int) – Divide the subplots into a grid with this many columns (if individually=True.

  • individually (bool) – If True, plot one subplot per datapoint in x, otherwise plot all the predictive distributions on the same plot.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

predict(x=None, method='mean', batch_size=None)

Predict dependent variable using the model

TODO… using maximum a posteriori param estimates etc

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • method (str) – Method to use for prediction. If 'mean', uses the mean of the predicted target distribution as the prediction. If 'mode', uses the mode of the distribution.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Predicted y-value for each sample in x. Of size (x.shape[0], y.shape[0], …, y.shape[-1])

Return type

ndarray

Examples

TODO: Docs…

predictive_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including all sources of uncertainty.

TODO: docs

TODO: using side= both, upper, vs lower

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the posterior predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

predictive_prc(x, y=None, n=1000, batch_size=None)

Compute the percentile of each observation along the posterior predictive distribution.

TODO: Docs… Returns a percentile between 0 and 1

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prcs

Return type

ndarray of float between 0 and 1

predictive_sample(x=None, n=1000, batch_size=None)

Draw samples from the posterior predictive distribution given x

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predictive distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

prior_plot(params=None, cols=1, **kwargs)

Plot prior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.prior_plot()

prior_sample(params=None, n=10000)

Draw samples from parameter priors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (list) – List of parameter names to sample. Each element should be a str. Default is to sample priors of all parameters in the model.

  • n (int) – Number of samples to take from each prior distribution. Default = 10000

Returns

Samples from the parameter prior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the prior samples. The ndarrays are of size (n,param.shape).

Return type

dict

prob(x, y=None, **kwargs)

Compute the probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns product of all probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns posterior probability distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

probs – Probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

r_squared(x, y=None, n=1000, batch_size=None)

Compute the Bayesian R-squared distribution (Gelman et al., 2018).

TODO: more info

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the r-squared distribution. Size: (num_samples,).

Return type

ndarray

Examples

TODO: Docs…

References

r_squared_plot(x, y=None, n=1000, style='hist', batch_size=None, **kwargs)

Plot the Bayesian R-squared distribution.

See r_squared() for more info on the Bayesian R-squared metric.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

reset_kl_loss()

Reset additional loss due to KL divergences

residuals(x, y=None, batch_size=None)

Compute the residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The residuals.

Return type

ndarray

Example

TODO

residuals_plot(x, y=None, batch_size=None, **kwargs)

Plot the distribution of residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

set_kl_weight(w)

Set the weight of the KL term’s contribution to the ELBO loss

set_learning_rate(lr)

Set the learning rate used by this model’s optimizer

sharpness(x, n=1000, batch_size=None)

Compute the sharpness of the model’s uncertainty estimates

The “sharpness” of a model’s uncertainty estimates is the root mean of the estimated variances:

\[SHA = \sqrt{\frac{1}{N} \sum_{i=1}^N \text{Var}(\hat{Y}_i)}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using sharpness, among other metrics. Note that the sharpness should generally be one of the later things you consider - accuracy and calibration usually being more important.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The sharpness of the model’s uncertainty estimates

Return type

float

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the sharpness of our model’s predictions with:

>>> model.sharpness(x_val)
0.173

References

stop_training()

Stop the training of the model

summary()

Show a summary of the model and its parameters.

TODO

TODO: though maybe this should be a method of module… model would have to add to it the observation dist

train_step(x_data, y_data)

Perform one training step

property trainable_variables

A list of trainable backend variables within this Module

class probflow.applications.LogisticRegression(d: int, k: int = 2)[source]

Bases: probflow.models.categorical_model.CategoricalModel

A logistic regression

TODO: explain, math, diagram, examples, etc

TODO: set k>2 for a Multinomial logistic regression

Parameters
  • d (int) – Dimensionality of the independent variable (number of features)

  • k (int) – Number of classes of the dependent variable

weights

Regression weights

Type

Parameter

bias

Regression intercept

Type

Parameter

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

aleatoric_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)

Return type

ndarray

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

calibration_curve(x, y=None, split_by=None, bins=10, plot=True, batch_size=None)

Plot and return the categorical calibration curve.

Plots and returns the calibration curve (estimated probability of outcome vs the true probability of that outcome).

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • split_by (int) – Draw the calibration curve independently for datapoints with each unique value in x[:,split_by] (a categorical column).

  • bins (int, list of float, or ndarray) – Bins used to compute the curve. If an integer, will use bins evenly-spaced bins from 0 to 1. If a vector, bins is the vector of bin edges.

  • plot (bool) – Whether to plot the curve

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • #TODO (split by continuous cols as well? Then will need to define bins or edges too) –

  • TODO (Docs...) –

dumps()

Serialize module object to bytes

elbo_loss(x_data, y_data, n: int, n_mc: int)

Compute the negative ELBO, scaled to a single sample.

Parameters
  • x_data – The independent variable values (or None if this is a generative model)

  • y_data – The dependent variable values

  • n (int) – Total number of datapoints in the dataset

  • n_mc (int) – Number of MC samples we’re taking from the posteriors

epistemic_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)

Fit the model to data

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Default = 128

  • epochs (int) – Number of epochs to train the model. Default = 200

  • shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if x is a DataGenerator Default = True

  • optimizer (None or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam). When the backend is PyTorch the default is to use TODO

  • optimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.

  • lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (batch_size).

  • flipout (bool) – Whether to use flipout during training where possible Default = True

  • num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If None, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if a DataGenerator is passed as x. Default = None

  • callbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is [], i.e. no callbacks.

  • eager (bool) – Whether to use eager execution. If False, will use tf.function (for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = False

  • n_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.

Example

See the user guide section on Fitting a Model.

get_elbo()

Get the current ELBO on training data

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

log_likelihood(x_data, y_data)

Compute the sum log likelihood of the model given a batch of data

log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)

Compute the log probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns log probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns sum of all log probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns log probability posterior distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

log_probs – Log probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

metric(metric, x, y=None, batch_size=None)

Compute a metric of model performance

TODO: docs

TODO: note that this doesn’t work w/ generative models

Parameters
  • metric (str or callable) –

    Metric to evaluate. Available metrics:

    • ’lp’: log likelihood sum

    • ’log_prob’: log likelihood sum

    • ’accuracy’: accuracy

    • ’acc’: accuracy

    • ’mean_squared_error’: mean squared error

    • ’mse’: mean squared error

    • ’sum_squared_error’: sum squared error

    • ’sse’: sum squared error

    • ’mean_absolute_error’: mean absolute error

    • ’mae’: mean absolute error

    • ’r_squared’: coefficient of determination

    • ’r2’: coefficient of determination

    • ’recall’: true positive rate

    • ’sensitivity’: true positive rate

    • ’true_positive_rate’: true positive rate

    • ’tpr’: true positive rate

    • ’specificity’: true negative rate

    • ’selectivity’: true negative rate

    • ’true_negative_rate’: true negative rate

    • ’tnr’: true negative rate

    • ’precision’: precision

    • ’f1_score’: F-measure

    • ’f1’: F-measure

    • callable: a function which takes (y_true, y_pred)

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator to generate both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

posterior_ci(params=None, ci=0.95, n=10000)

Posterior confidence intervals

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.

  • ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95

  • n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000

Returns

Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str

Return type

dict

posterior_mean(params=None)

Get the mean of the posterior distribution(s)

TODO: Docs… params is a list of strings of params to plot

Parameters

params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.

Returns

Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior means. The ndarrays are the same size as each parameter. Or just the ndarray if params was a str.

Return type

dict

posterior_plot(params=None, cols=1, **kwargs)

Plot posterior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.posterior_plot()

posterior_sample(params=None, n=10000)

Draw samples from parameter posteriors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get a sample for all parameters in the model.

  • num_samples (int) – Number of samples to take from each posterior distribution. Default = 1000

Returns

Samples from the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior samples. The ndarrays are of size (num_samples, param.shape). Or just the ndarray if params was a str.

Return type

dict

pred_dist_plot(x, n=10000, cols=1, batch_size=None, **kwargs)

Plot posterior predictive distribution from the model given x.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model given x. Default = 10000

  • cols (int) – Divide the subplots into a grid with this many columns (if individually=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_categorical_dist()

predict(x=None, method='mean', batch_size=None)

Predict dependent variable using the model

TODO… using maximum a posteriori param estimates etc

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • method (str) – Method to use for prediction. If 'mean', uses the mean of the predicted target distribution as the prediction. If 'mode', uses the mode of the distribution.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Predicted y-value for each sample in x. Of size (x.shape[0], y.shape[0], …, y.shape[-1])

Return type

ndarray

Examples

TODO: Docs…

predictive_sample(x=None, n=1000, batch_size=None)

Draw samples from the posterior predictive distribution given x

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predictive distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

prior_plot(params=None, cols=1, **kwargs)

Plot prior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.prior_plot()

prior_sample(params=None, n=10000)

Draw samples from parameter priors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (list) – List of parameter names to sample. Each element should be a str. Default is to sample priors of all parameters in the model.

  • n (int) – Number of samples to take from each prior distribution. Default = 10000

Returns

Samples from the parameter prior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the prior samples. The ndarrays are of size (n,param.shape).

Return type

dict

prob(x, y=None, **kwargs)

Compute the probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns product of all probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns posterior probability distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

probs – Probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

set_kl_weight(w)

Set the weight of the KL term’s contribution to the ELBO loss

set_learning_rate(lr)

Set the learning rate used by this model’s optimizer

stop_training()

Stop the training of the model

summary()

Show a summary of the model and its parameters.

TODO

TODO: though maybe this should be a method of module… model would have to add to it the observation dist

train_step(x_data, y_data)

Perform one training step

property trainable_variables

A list of trainable backend variables within this Module

class probflow.applications.PoissonRegression(d: int)[source]

Bases: probflow.models.discrete_model.DiscreteModel

A Poisson regression (a type of generalized linear model)

TODO: explain, math, diagram, examples, etc

Parameters

d (int) – Dimensionality of the independent variable (number of features)

weights

Regression weights

Type

Parameter

bias

Regression intercept

Type

Parameter

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

aleatoric_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only aleatoric uncertainty (uncertainty due to noise).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the aleatoric predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

aleatoric_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)

Return type

ndarray

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

calibration_curve(x, y, n=1000, resolution=100, batch_size=None)

Compute the regression calibration curve (Kuleshov et al., 2018).

The regression calibration curve compares the empirical cumulative probability to the cumulative probability predicted by a regression model (Kuleshov et al., 2018). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

The calibration curve then plots \(p\) against \(\hat{p}\).

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • p (|ndarray|) – The predicted cumulative frequencies, \(p\).

  • p_hat (|ndarray|) – The empirical cumulative frequencies, \(\hat{p}\).

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the calibration curve with calibration_curve():

p_pred, p_empirical = model.calibration_curve(x_val, y_val)

The returned values can be used directly or plotted against one another to get the calibration curve (as in Figure 3 in Kuleshov et al., 2018)

import matplotlib.pyplot as plt
plt.plot(p_pred, p_empirical)

Or, even more simply, just use calibration_curve_plot().

See also

References

calibration_curve_plot(x, y, n=1000, resolution=100, batch_size=None, **kwargs)

Plot the regression calibration curve.

See calibration_curve() for more info about the regression calibration curve.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

See also

calibration_metric(metric, x, y=None, n=1000, resolution=100, batch_size=None)

Compute one or more of several calibration metrics

Regression calibration metrics measure the error between a model’s regression calibration curve and the ideal calibration curve - i.e., what the curve would be if the model were perfectly calibrated (see Kuleshov et al., 2018 and Chung et al., 2020). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

Various metrics can be computed from these curves to measure how accurately the regression model captures uncertainty:

The mean squared calibration error (MSCE) is the mean squared error between the empirical and predicted frequencies,

\[MSCE = \frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2\]

The root mean squared calibration error (RMSCE) is just the square root of the MSCE:

\[RMSCE = \sqrt{\frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2}\]

The mean absolute calibration error (MACE) is the mean of the absolute differences between the empirical and predicted frequencies:

\[MACE = \frac{1}{m} \sum_{j=1}^m | p_j - \hat{p}_j |\]

And the miscalibration area (MA) is the area between the calibration curve and the ideal calibration curve (the identity line from (0, 0) to (1, 1):

\[MA = \int_0^1 p_x - \hat{p}_x dx\]

Note that MA is equal to MACE as the number of bins (set by the resolution keyword argument) goes to infinity.

To choose which metric to compute, pass the name of the metric (msce, rmsce, mace, or ma) as the first argument to this function (or a list of them to compute multiple).

See Kuleshov et al., 2018, Chung et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using calibration metrics, among other metrics. Note that calibration is generally less important than accuracy, but more important than other metrics like sharpness() and any dispersion_metric().

Parameters
  • metric (str {'msce', 'rmsce', 'mace', or 'ma'} or List[str]) –

    Which metric(s) to compute (see above for the definition of each metric). To compute multiple metrics, pass a list of the metric names you’d like to compute. Available metrics are:

    • msce: mean squared calibration error

    • rmsce: root mean squared calibration error

    • mace: mean absolute calibration error

    • ma: miscalibration area

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested calibration metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute different calibration metrics using expected_calibration_error(). For example, to compute the mean squared calibration error (MSCE):

>>> model.calibration_metric("msce", x_val, y_val)
0.123

Or, to compute the mean absolute calibration error (MACE):

>>> model.calibration_metric("mace", x_val, y_val)
0.211

To compute multiple metrics at the same time, pass a list of metric names:

>>> model.calibration_metric(["msce", "mace"], x_val, y_val)
{"msce": 0.123, "mace": 0.211}

References

coverage_by(x_by, x, y=None, n: int = 1000, ci: float = 0.95, bins: int = 30, plot: bool = True, ideal_line_kwargs: dict = {}, batch_size=None, **kwargs)

Compute and plot the coverage of a given confidence interval of the posterior predictive distribution as a function of specified independent variables.

TODO: Docs…

Parameters
  • x_by (int or str or list of int or list of str) – Which independent variable(s) to plot the log probability as a function of. That is, which columns in x to plot by.

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • ci (float between 0 and 1) – Inner percentile to find the coverage of. For example, if ci=0.95, will compute the coverage of the inner 95% of the posterior predictive distribution.

  • bins (int) – Number of bins to use for x_by

  • ideal_line_kwargs (dict) – Dict of args to pass to matplotlib.pyplot.plot for ideal coverage line.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_by

Returns

  • xo (|ndarray|) – Values of x_by corresponding to bin centers.

  • co (|ndarray|) – Coverage of the ci confidence interval of the predictive distribution in each bin.

dispersion_metric(metric, x, n=1000, batch_size=None)

Compute one or more of several calibration metrics

Dispersion metrics measure how much a model’s uncertainty estimates vary. There are several different dispersion metrics:

The coefficient of variation (\(C_v\)) is the ratio of the standard deviation to the mean (of the model’s uncertainty standard deviations):

\[C_v =\]

The quartile coefficient of dispersion (\(QCD\)) is less sensitive to outliers, as it simply measures the difference between the first and third quartile (of the model’s uncertainty standard deviations) to their sum:

\[QCD = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using dispersion metrics, among other metrics. Note that dispersion metrics should generally be one of the last things you consider - accuracy, calibration, and sharpness usually being more important.

Parameters
  • metric (str {'cv' or 'qcd'} or List[str]) – Dispersion metric to compute. Or,

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested dispersion metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the coefficient of variation of our model’s predictions with:

>>> model.dispersion_metric('cv', x_val)
0.732

Or the quartile coefficient of dispersion with:

>>> model.dispersion_metric('qcd', x_val)
0.625

References

dumps()

Serialize module object to bytes

elbo_loss(x_data, y_data, n: int, n_mc: int)

Compute the negative ELBO, scaled to a single sample.

Parameters
  • x_data – The independent variable values (or None if this is a generative model)

  • y_data – The dependent variable values

  • n (int) – Total number of datapoints in the dataset

  • n_mc (int) – Number of MC samples we’re taking from the posteriors

epistemic_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the epistemic predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

epistemic_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)

Fit the model to data

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Default = 128

  • epochs (int) – Number of epochs to train the model. Default = 200

  • shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if x is a DataGenerator Default = True

  • optimizer (None or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam). When the backend is PyTorch the default is to use TODO

  • optimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.

  • lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (batch_size).

  • flipout (bool) – Whether to use flipout during training where possible Default = True

  • num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If None, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if a DataGenerator is passed as x. Default = None

  • callbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is [], i.e. no callbacks.

  • eager (bool) – Whether to use eager execution. If False, will use tf.function (for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = False

  • n_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.

Example

See the user guide section on Fitting a Model.

get_elbo()

Get the current ELBO on training data

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

log_likelihood(x_data, y_data)

Compute the sum log likelihood of the model given a batch of data

log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)

Compute the log probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns log probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns sum of all log probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns log probability posterior distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

log_probs – Log probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

metric(metric, x, y=None, batch_size=None)

Compute a metric of model performance

TODO: docs

TODO: note that this doesn’t work w/ generative models

Parameters
  • metric (str or callable) –

    Metric to evaluate. Available metrics:

    • ’lp’: log likelihood sum

    • ’log_prob’: log likelihood sum

    • ’accuracy’: accuracy

    • ’acc’: accuracy

    • ’mean_squared_error’: mean squared error

    • ’mse’: mean squared error

    • ’sum_squared_error’: sum squared error

    • ’sse’: sum squared error

    • ’mean_absolute_error’: mean absolute error

    • ’mae’: mean absolute error

    • ’r_squared’: coefficient of determination

    • ’r2’: coefficient of determination

    • ’recall’: true positive rate

    • ’sensitivity’: true positive rate

    • ’true_positive_rate’: true positive rate

    • ’tpr’: true positive rate

    • ’specificity’: true negative rate

    • ’selectivity’: true negative rate

    • ’true_negative_rate’: true negative rate

    • ’tnr’: true negative rate

    • ’precision’: precision

    • ’f1_score’: F-measure

    • ’f1’: F-measure

    • callable: a function which takes (y_true, y_pred)

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator to generate both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

posterior_ci(params=None, ci=0.95, n=10000)

Posterior confidence intervals

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.

  • ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95

  • n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000

Returns

Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str

Return type

dict

posterior_mean(params=None)

Get the mean of the posterior distribution(s)

TODO: Docs… params is a list of strings of params to plot

Parameters

params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.

Returns

Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior means. The ndarrays are the same size as each parameter. Or just the ndarray if params was a str.

Return type

dict

posterior_plot(params=None, cols=1, **kwargs)

Plot posterior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.posterior_plot()

posterior_sample(params=None, n=10000)

Draw samples from parameter posteriors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get a sample for all parameters in the model.

  • num_samples (int) – Number of samples to take from each posterior distribution. Default = 1000

Returns

Samples from the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior samples. The ndarrays are of size (num_samples, param.shape). Or just the ndarray if params was a str.

Return type

dict

pred_dist_coverage(x, y=None, n=1000, ci=0.95, batch_size=None)

Compute what percent of samples are covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prc_covered – Proportion of the samples which were covered by the predictive distribution’s confidence interval.

Return type

float between 0 and 1

pred_dist_covered(x, y=None, n: int = 1000, ci: float = 0.95, batch_size=None)

Compute whether each observation was covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

pred_dist_plot(x, n=10000, cols=1, **kwargs)

Plot posterior predictive distribution from the model given x.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model given x. Default = 10000

  • cols (int) – Divide the subplots into a grid with this many columns (if individually=True.

  • **kwargs – Additional keyword arguments are passed to plot_discrete_dist()

predict(x=None, method='mean', batch_size=None)

Predict dependent variable using the model

TODO… using maximum a posteriori param estimates etc

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • method (str) – Method to use for prediction. If 'mean', uses the mean of the predicted target distribution as the prediction. If 'mode', uses the mode of the distribution.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Predicted y-value for each sample in x. Of size (x.shape[0], y.shape[0], …, y.shape[-1])

Return type

ndarray

Examples

TODO: Docs…

predictive_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including all sources of uncertainty.

TODO: docs

TODO: using side= both, upper, vs lower

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the posterior predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

predictive_prc(x, y=None, n=1000, batch_size=None)

Compute the percentile of each observation along the posterior predictive distribution.

TODO: Docs… Returns a percentile between 0 and 1

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prcs

Return type

ndarray of float between 0 and 1

predictive_sample(x=None, n=1000, batch_size=None)

Draw samples from the posterior predictive distribution given x

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predictive distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

prior_plot(params=None, cols=1, **kwargs)

Plot prior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.prior_plot()

prior_sample(params=None, n=10000)

Draw samples from parameter priors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (list) – List of parameter names to sample. Each element should be a str. Default is to sample priors of all parameters in the model.

  • n (int) – Number of samples to take from each prior distribution. Default = 10000

Returns

Samples from the parameter prior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the prior samples. The ndarrays are of size (n,param.shape).

Return type

dict

prob(x, y=None, **kwargs)

Compute the probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns product of all probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns posterior probability distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

probs – Probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

r_squared(*args, **kwargs)

Cannot compute R squared for a discrete model

r_squared_plot(*args, **kwargs)

Cannot compute R squared for a discrete model

reset_kl_loss()

Reset additional loss due to KL divergences

residuals(x, y=None, batch_size=None)

Compute the residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The residuals.

Return type

ndarray

Example

TODO

residuals_plot(x, y=None, batch_size=None, **kwargs)

Plot the distribution of residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

set_kl_weight(w)

Set the weight of the KL term’s contribution to the ELBO loss

set_learning_rate(lr)

Set the learning rate used by this model’s optimizer

sharpness(x, n=1000, batch_size=None)

Compute the sharpness of the model’s uncertainty estimates

The “sharpness” of a model’s uncertainty estimates is the root mean of the estimated variances:

\[SHA = \sqrt{\frac{1}{N} \sum_{i=1}^N \text{Var}(\hat{Y}_i)}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using sharpness, among other metrics. Note that the sharpness should generally be one of the later things you consider - accuracy and calibration usually being more important.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The sharpness of the model’s uncertainty estimates

Return type

float

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the sharpness of our model’s predictions with:

>>> model.sharpness(x_val)
0.173

References

stop_training()

Stop the training of the model

summary()

Show a summary of the model and its parameters.

TODO

TODO: though maybe this should be a method of module… model would have to add to it the observation dist

train_step(x_data, y_data)

Perform one training step

property trainable_variables

A list of trainable backend variables within this Module

class probflow.applications.DenseRegression(d: List[int], heteroscedastic: bool = False, **kwargs)[source]

Bases: probflow.models.continuous_model.ContinuousModel

A regression using a multilayer dense neural network

TODO: explain, math, diagram, examples, etc

Parameters
  • d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features), and the last element should be the dimensionality of the dependent variable (number of dimensions of the target).

  • heteroscedastic (bool) – Whether to model a change in noise as a function of \(\mathbf{x}\) (if heteroscedastic=True), or not (if heteroscedastic=False, the default).

  • activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)

  • kwargs – Additional keyword arguments are passed to DenseNetwork

network

The multilayer dense neural network which generates predictions of the mean

Type

DenseNetwork

std

Standard deviation of the Normal observation distribution

Type

ScaleParameter

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

aleatoric_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only aleatoric uncertainty (uncertainty due to noise).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the aleatoric predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

aleatoric_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)

Return type

ndarray

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

calibration_curve(x, y, n=1000, resolution=100, batch_size=None)

Compute the regression calibration curve (Kuleshov et al., 2018).

The regression calibration curve compares the empirical cumulative probability to the cumulative probability predicted by a regression model (Kuleshov et al., 2018). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

The calibration curve then plots \(p\) against \(\hat{p}\).

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • p (|ndarray|) – The predicted cumulative frequencies, \(p\).

  • p_hat (|ndarray|) – The empirical cumulative frequencies, \(\hat{p}\).

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the calibration curve with calibration_curve():

p_pred, p_empirical = model.calibration_curve(x_val, y_val)

The returned values can be used directly or plotted against one another to get the calibration curve (as in Figure 3 in Kuleshov et al., 2018)

import matplotlib.pyplot as plt
plt.plot(p_pred, p_empirical)

Or, even more simply, just use calibration_curve_plot().

See also

References

calibration_curve_plot(x, y, n=1000, resolution=100, batch_size=None, **kwargs)

Plot the regression calibration curve.

See calibration_curve() for more info about the regression calibration curve.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

See also

calibration_metric(metric, x, y=None, n=1000, resolution=100, batch_size=None)

Compute one or more of several calibration metrics

Regression calibration metrics measure the error between a model’s regression calibration curve and the ideal calibration curve - i.e., what the curve would be if the model were perfectly calibrated (see Kuleshov et al., 2018 and Chung et al., 2020). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:

\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]

Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:

\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]

where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).

Various metrics can be computed from these curves to measure how accurately the regression model captures uncertainty:

The mean squared calibration error (MSCE) is the mean squared error between the empirical and predicted frequencies,

\[MSCE = \frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2\]

The root mean squared calibration error (RMSCE) is just the square root of the MSCE:

\[RMSCE = \sqrt{\frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2}\]

The mean absolute calibration error (MACE) is the mean of the absolute differences between the empirical and predicted frequencies:

\[MACE = \frac{1}{m} \sum_{j=1}^m | p_j - \hat{p}_j |\]

And the miscalibration area (MA) is the area between the calibration curve and the ideal calibration curve (the identity line from (0, 0) to (1, 1):

\[MA = \int_0^1 p_x - \hat{p}_x dx\]

Note that MA is equal to MACE as the number of bins (set by the resolution keyword argument) goes to infinity.

To choose which metric to compute, pass the name of the metric (msce, rmsce, mace, or ma) as the first argument to this function (or a list of them to compute multiple).

See Kuleshov et al., 2018, Chung et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using calibration metrics, among other metrics. Note that calibration is generally less important than accuracy, but more important than other metrics like sharpness() and any dispersion_metric().

Parameters
  • metric (str {'msce', 'rmsce', 'mace', or 'ma'} or List[str]) –

    Which metric(s) to compute (see above for the definition of each metric). To compute multiple metrics, pass a list of the metric names you’d like to compute. Available metrics are:

    • msce: mean squared calibration error

    • rmsce: root mean squared calibration error

    • mace: mean absolute calibration error

    • ma: miscalibration area

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000

  • resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested calibration metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute different calibration metrics using expected_calibration_error(). For example, to compute the mean squared calibration error (MSCE):

>>> model.calibration_metric("msce", x_val, y_val)
0.123

Or, to compute the mean absolute calibration error (MACE):

>>> model.calibration_metric("mace", x_val, y_val)
0.211

To compute multiple metrics at the same time, pass a list of metric names:

>>> model.calibration_metric(["msce", "mace"], x_val, y_val)
{"msce": 0.123, "mace": 0.211}

References

coverage_by(x_by, x, y=None, n: int = 1000, ci: float = 0.95, bins: int = 30, plot: bool = True, ideal_line_kwargs: dict = {}, batch_size=None, **kwargs)

Compute and plot the coverage of a given confidence interval of the posterior predictive distribution as a function of specified independent variables.

TODO: Docs…

Parameters
  • x_by (int or str or list of int or list of str) – Which independent variable(s) to plot the log probability as a function of. That is, which columns in x to plot by.

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • ci (float between 0 and 1) – Inner percentile to find the coverage of. For example, if ci=0.95, will compute the coverage of the inner 95% of the posterior predictive distribution.

  • bins (int) – Number of bins to use for x_by

  • ideal_line_kwargs (dict) – Dict of args to pass to matplotlib.pyplot.plot for ideal coverage line.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_by

Returns

  • xo (|ndarray|) – Values of x_by corresponding to bin centers.

  • co (|ndarray|) – Coverage of the ci confidence interval of the predictive distribution in each bin.

dispersion_metric(metric, x, n=1000, batch_size=None)

Compute one or more of several calibration metrics

Dispersion metrics measure how much a model’s uncertainty estimates vary. There are several different dispersion metrics:

The coefficient of variation (\(C_v\)) is the ratio of the standard deviation to the mean (of the model’s uncertainty standard deviations):

\[C_v =\]

The quartile coefficient of dispersion (\(QCD\)) is less sensitive to outliers, as it simply measures the difference between the first and third quartile (of the model’s uncertainty standard deviations) to their sum:

\[QCD = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using dispersion metrics, among other metrics. Note that dispersion metrics should generally be one of the last things you consider - accuracy, calibration, and sharpness usually being more important.

Parameters
  • metric (str {'cv' or 'qcd'} or List[str]) – Dispersion metric to compute. Or,

  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The requested dispersion metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.

Return type

float or Dict[str, float]

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the coefficient of variation of our model’s predictions with:

>>> model.dispersion_metric('cv', x_val)
0.732

Or the quartile coefficient of dispersion with:

>>> model.dispersion_metric('qcd', x_val)
0.625

References

dumps()

Serialize module object to bytes

elbo_loss(x_data, y_data, n: int, n_mc: int)

Compute the negative ELBO, scaled to a single sample.

Parameters
  • x_data – The independent variable values (or None if this is a generative model)

  • y_data – The dependent variable values

  • n (int) – Total number of datapoints in the dataset

  • n_mc (int) – Number of MC samples we’re taking from the posteriors

epistemic_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values).

TODO: docs

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the epistemic predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

epistemic_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)

Fit the model to data

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Default = 128

  • epochs (int) – Number of epochs to train the model. Default = 200

  • shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if x is a DataGenerator Default = True

  • optimizer (None or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam). When the backend is PyTorch the default is to use TODO

  • optimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.

  • lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (batch_size).

  • flipout (bool) – Whether to use flipout during training where possible Default = True

  • num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If None, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if a DataGenerator is passed as x. Default = None

  • callbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is [], i.e. no callbacks.

  • eager (bool) – Whether to use eager execution. If False, will use tf.function (for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = False

  • n_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.

Example

See the user guide section on Fitting a Model.

get_elbo()

Get the current ELBO on training data

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

log_likelihood(x_data, y_data)

Compute the sum log likelihood of the model given a batch of data

log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)

Compute the log probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns log probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns sum of all log probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns log probability posterior distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

log_probs – Log probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

metric(metric, x, y=None, batch_size=None)

Compute a metric of model performance

TODO: docs

TODO: note that this doesn’t work w/ generative models

Parameters
  • metric (str or callable) –

    Metric to evaluate. Available metrics:

    • ’lp’: log likelihood sum

    • ’log_prob’: log likelihood sum

    • ’accuracy’: accuracy

    • ’acc’: accuracy

    • ’mean_squared_error’: mean squared error

    • ’mse’: mean squared error

    • ’sum_squared_error’: sum squared error

    • ’sse’: sum squared error

    • ’mean_absolute_error’: mean absolute error

    • ’mae’: mean absolute error

    • ’r_squared’: coefficient of determination

    • ’r2’: coefficient of determination

    • ’recall’: true positive rate

    • ’sensitivity’: true positive rate

    • ’true_positive_rate’: true positive rate

    • ’tpr’: true positive rate

    • ’specificity’: true negative rate

    • ’selectivity’: true negative rate

    • ’true_negative_rate’: true negative rate

    • ’tnr’: true negative rate

    • ’precision’: precision

    • ’f1_score’: F-measure

    • ’f1’: F-measure

    • callable: a function which takes (y_true, y_pred)

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator to generate both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

posterior_ci(params=None, ci=0.95, n=10000)

Posterior confidence intervals

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.

  • ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95

  • n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000

Returns

Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str

Return type

dict

posterior_mean(params=None)

Get the mean of the posterior distribution(s)

TODO: Docs… params is a list of strings of params to plot

Parameters

params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.

Returns

Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior means. The ndarrays are the same size as each parameter. Or just the ndarray if params was a str.

Return type

dict

posterior_plot(params=None, cols=1, **kwargs)

Plot posterior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.posterior_plot()

posterior_sample(params=None, n=10000)

Draw samples from parameter posteriors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get a sample for all parameters in the model.

  • num_samples (int) – Number of samples to take from each posterior distribution. Default = 1000

Returns

Samples from the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior samples. The ndarrays are of size (num_samples, param.shape). Or just the ndarray if params was a str.

Return type

dict

pred_dist_coverage(x, y=None, n=1000, ci=0.95, batch_size=None)

Compute what percent of samples are covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prc_covered – Proportion of the samples which were covered by the predictive distribution’s confidence interval.

Return type

float between 0 and 1

pred_dist_covered(x, y=None, n: int = 1000, ci: float = 0.95, batch_size=None)

Compute whether each observation was covered by a given confidence interval.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • ci (float between 0 and 1) – Confidence interval to use.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

pred_dist_plot(x, n=10000, cols=1, individually=False, batch_size=None, **kwargs)

Plot posterior predictive distribution from the model given x.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model given x. Default = 10000

  • cols (int) – Divide the subplots into a grid with this many columns (if individually=True.

  • individually (bool) – If True, plot one subplot per datapoint in x, otherwise plot all the predictive distributions on the same plot.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

predict(x=None, method='mean', batch_size=None)

Predict dependent variable using the model

TODO… using maximum a posteriori param estimates etc

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • method (str) – Method to use for prediction. If 'mean', uses the mean of the predicted target distribution as the prediction. If 'mode', uses the mode of the distribution.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Predicted y-value for each sample in x. Of size (x.shape[0], y.shape[0], …, y.shape[-1])

Return type

ndarray

Examples

TODO: Docs…

predictive_interval(x, ci=0.95, side='both', n=1000, batch_size=None)

Compute confidence intervals on the model’s estimate of the target given x, including all sources of uncertainty.

TODO: docs

TODO: using side= both, upper, vs lower

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95

  • side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If 'both' (default), gets the upper and lower bounds of the central ci interval. If 'lower', gets the lower bound on the one-sided ci interval. If 'upper', gets the upper bound on the one-sided ci interval.

  • n (int) – Number of samples from the posterior predictive distribution to take to compute the confidence intervals. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

  • lb (|ndarray|) – Lower bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='upper'.

  • ub (|ndarray|) – Upper bounds of the ci confidence intervals on the predictions for samples in x. Doesn’t return this if side='lower'.

predictive_prc(x, y=None, n=1000, batch_size=None)

Compute the percentile of each observation along the posterior predictive distribution.

TODO: Docs… Returns a percentile between 0 and 1

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of samples to draw from the model given x. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

prcs

Return type

ndarray of float between 0 and 1

predictive_sample(x=None, n=1000, batch_size=None)

Draw samples from the posterior predictive distribution given x

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predictive distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

prior_plot(params=None, cols=1, **kwargs)

Plot prior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.prior_plot()

prior_sample(params=None, n=10000)

Draw samples from parameter priors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (list) – List of parameter names to sample. Each element should be a str. Default is to sample priors of all parameters in the model.

  • n (int) – Number of samples to take from each prior distribution. Default = 10000

Returns

Samples from the parameter prior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the prior samples. The ndarrays are of size (n,param.shape).

Return type

dict

prob(x, y=None, **kwargs)

Compute the probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns product of all probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns posterior probability distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

probs – Probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

r_squared(x, y=None, n=1000, batch_size=None)

Compute the Bayesian R-squared distribution (Gelman et al., 2018).

TODO: more info

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the r-squared distribution. Size: (num_samples,).

Return type

ndarray

Examples

TODO: Docs…

References

r_squared_plot(x, y=None, n=1000, style='hist', batch_size=None, **kwargs)

Plot the Bayesian R-squared distribution.

See r_squared() for more info on the Bayesian R-squared metric.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

reset_kl_loss()

Reset additional loss due to KL divergences

residuals(x, y=None, batch_size=None)

Compute the residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The residuals.

Return type

ndarray

Example

TODO

residuals_plot(x, y=None, batch_size=None, **kwargs)

Plot the distribution of residuals of the model’s predictions.

TODO: docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_dist()

Example

TODO

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

set_kl_weight(w)

Set the weight of the KL term’s contribution to the ELBO loss

set_learning_rate(lr)

Set the learning rate used by this model’s optimizer

sharpness(x, n=1000, batch_size=None)

Compute the sharpness of the model’s uncertainty estimates

The “sharpness” of a model’s uncertainty estimates is the root mean of the estimated variances:

\[SHA = \sqrt{\frac{1}{N} \sum_{i=1}^N \text{Var}(\hat{Y}_i)}\]

See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using sharpness, among other metrics. Note that the sharpness should generally be one of the later things you consider - accuracy and calibration usually being more important.

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • n (int) – Number of samples to draw from the model. Default = 1000

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

The sharpness of the model’s uncertainty estimates

Return type

float

Example

Supposing we have some training data (x_train and y_train) and validation data (x_val and y_val), and have already fit a model to the training data,

model = # some ProbFlow model...
model.fit(x_train, y_train)

Then we can compute the sharpness of our model’s predictions with:

>>> model.sharpness(x_val)
0.173

References

stop_training()

Stop the training of the model

summary()

Show a summary of the model and its parameters.

TODO

TODO: though maybe this should be a method of module… model would have to add to it the observation dist

train_step(x_data, y_data)

Perform one training step

property trainable_variables

A list of trainable backend variables within this Module

class probflow.applications.DenseClassifier(d: List[int], **kwargs)[source]

Bases: probflow.models.categorical_model.CategoricalModel

A classifier which uses a multilayer dense neural network

TODO: explain, math, diagram, examples, etc

Parameters
  • d (List[int]) – Dimensionality (number of units) for each layer. The first element should be the dimensionality of the independent variable (number of features), and the last element should be the number of classes of the target.

  • activation (callable) – Activation function to apply to the outputs of each layer. Note that the activation function will not be applied to the outputs of the final layer. Default = \(\max ( 0, x )\)

  • kwargs – Additional keyword arguments are passed to DenseNetwork

network

The multilayer dense neural network which generates predictions of the class probabilities

Type

DenseNetwork

add_kl_loss(loss, d2=None)

Add additional loss due to KL divergences.

aleatoric_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)

Return type

ndarray

bayesian_update()

Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.

calibration_curve(x, y=None, split_by=None, bins=10, plot=True, batch_size=None)

Plot and return the categorical calibration curve.

Plots and returns the calibration curve (estimated probability of outcome vs the true probability of that outcome).

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • split_by (int) – Draw the calibration curve independently for datapoints with each unique value in x[:,split_by] (a categorical column).

  • bins (int, list of float, or ndarray) – Bins used to compute the curve. If an integer, will use bins evenly-spaced bins from 0 to 1. If a vector, bins is the vector of bin edges.

  • plot (bool) – Whether to plot the curve

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • #TODO (split by continuous cols as well? Then will need to define bins or edges too) –

  • TODO (Docs...) –

dumps()

Serialize module object to bytes

elbo_loss(x_data, y_data, n: int, n_mc: int)

Compute the negative ELBO, scaled to a single sample.

Parameters
  • x_data – The independent variable values (or None if this is a generative model)

  • y_data – The dependent variable values

  • n (int) – Total number of datapoints in the dataset

  • n_mc (int) – Number of MC samples we’re taking from the posteriors

epistemic_sample(x=None, n=1000, batch_size=None)

Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)

Fit the model to data

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Default = 128

  • epochs (int) – Number of epochs to train the model. Default = 200

  • shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if x is a DataGenerator Default = True

  • optimizer (None or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam). When the backend is PyTorch the default is to use TODO

  • optimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.

  • lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (batch_size).

  • flipout (bool) – Whether to use flipout during training where possible Default = True

  • num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If None, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if a DataGenerator is passed as x. Default = None

  • callbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is [], i.e. no callbacks.

  • eager (bool) – Whether to use eager execution. If False, will use tf.function (for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = False

  • n_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.

Example

See the user guide section on Fitting a Model.

get_elbo()

Get the current ELBO on training data

kl_loss()

Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.

kl_loss_batch()

Compute the sum of additional Kullback-Leibler divergences due to data in this batch

log_likelihood(x_data, y_data)

Compute the sum log likelihood of the model given a batch of data

log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)

Compute the log probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns log probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns sum of all log probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns log probability posterior distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

log_probs – Log probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

metric(metric, x, y=None, batch_size=None)

Compute a metric of model performance

TODO: docs

TODO: note that this doesn’t work w/ generative models

Parameters
  • metric (str or callable) –

    Metric to evaluate. Available metrics:

    • ’lp’: log likelihood sum

    • ’log_prob’: log likelihood sum

    • ’accuracy’: accuracy

    • ’acc’: accuracy

    • ’mean_squared_error’: mean squared error

    • ’mse’: mean squared error

    • ’sum_squared_error’: sum squared error

    • ’sse’: sum squared error

    • ’mean_absolute_error’: mean absolute error

    • ’mae’: mean absolute error

    • ’r_squared’: coefficient of determination

    • ’r2’: coefficient of determination

    • ’recall’: true positive rate

    • ’sensitivity’: true positive rate

    • ’true_positive_rate’: true positive rate

    • ’tpr’: true positive rate

    • ’specificity’: true negative rate

    • ’selectivity’: true negative rate

    • ’true_negative_rate’: true negative rate

    • ’tnr’: true negative rate

    • ’precision’: precision

    • ’f1_score’: F-measure

    • ’f1’: F-measure

    • callable: a function which takes (y_true, y_pred)

  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator to generate both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Return type

TODO

property modules

A list of sub-Modules in this Module, including itself.

property n_parameters

Get the number of independent parameters of this module

property n_variables

Get the number of underlying variables in this module

property parameters

A list of Parameters in this Module and its sub-Modules.

posterior_ci(params=None, ci=0.95, n=10000)

Posterior confidence intervals

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.

  • ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95

  • n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000

Returns

Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str

Return type

dict

posterior_mean(params=None)

Get the mean of the posterior distribution(s)

TODO: Docs… params is a list of strings of params to plot

Parameters

params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.

Returns

Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior means. The ndarrays are the same size as each parameter. Or just the ndarray if params was a str.

Return type

dict

posterior_plot(params=None, cols=1, **kwargs)

Plot posterior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.posterior_plot()

posterior_sample(params=None, n=10000)

Draw samples from parameter posteriors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or List[str] or None) – Parameter name(s) to sample. Default is to get a sample for all parameters in the model.

  • num_samples (int) – Number of samples to take from each posterior distribution. Default = 1000

Returns

Samples from the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the posterior samples. The ndarrays are of size (num_samples, param.shape). Or just the ndarray if params was a str.

Return type

dict

pred_dist_plot(x, n=10000, cols=1, batch_size=None, **kwargs)

Plot posterior predictive distribution from the model given x.

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model given x. Default = 10000

  • cols (int) – Divide the subplots into a grid with this many columns (if individually=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

  • **kwargs – Additional keyword arguments are passed to plot_categorical_dist()

predict(x=None, method='mean', batch_size=None)

Predict dependent variable using the model

TODO… using maximum a posteriori param estimates etc

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • method (str) – Method to use for prediction. If 'mean', uses the mean of the predicted target distribution as the prediction. If 'mode', uses the mode of the distribution.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Predicted y-value for each sample in x. Of size (x.shape[0], y.shape[0], …, y.shape[-1])

Return type

ndarray

Examples

TODO: Docs…

predictive_sample(x=None, n=1000, batch_size=None)

Draw samples from the posterior predictive distribution given x

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”).

  • n (int) – Number of samples to draw from the model per datapoint.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

Samples from the predictive distribution. Size (num_samples, x.shape[0], …)

Return type

ndarray

prior_plot(params=None, cols=1, **kwargs)

Plot prior distributions of the model’s parameters

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.

  • cols (int) – Divide the subplots into a grid with this many columns.

  • kwargs – Additional keyword arguments are passed to Parameter.prior_plot()

prior_sample(params=None, n=10000)

Draw samples from parameter priors

TODO: Docs… params is a list of strings of params to plot

Parameters
  • params (list) – List of parameter names to sample. Each element should be a str. Default is to sample priors of all parameters in the model.

  • n (int) – Number of samples to take from each prior distribution. Default = 10000

Returns

Samples from the parameter prior distributions. A dictionary where the keys contain the parameter names and the values contain ndarrays with the prior samples. The ndarrays are of size (n,param.shape).

Return type

dict

prob(x, y=None, **kwargs)

Compute the probability of y given the model

TODO: Docs…

Parameters
  • x (ndarray or DataFrame or Series or Tensor or DataGenerator) – Independent variable values of the dataset to evaluate (aka the “features”). Or a DataGenerator for both x and y.

  • y (ndarray or DataFrame or Series or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).

  • individually (bool) – If individually is True, returns probability for each sample individually, so return shape is (x.shape[0], ?). If individually is False, returns product of all probabilities, so return shape is (1, ?).

  • distribution (bool) – If distribution is True, returns posterior probability distribution (n samples from the model), so return shape is (?, n). If distribution is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is (?, 1).

  • n (int) – Number of samples to draw for each distribution if distribution=True.

  • batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).

Returns

probs – Probabilities. Shape is determined by individually, distribution, and n kwargs.

Return type

ndarray

reset_kl_loss()

Reset additional loss due to KL divergences

save(filename: str)

Save module object to file

Parameters

filename (str) – Filename for file to which to save this object

set_kl_weight(w)

Set the weight of the KL term’s contribution to the ELBO loss

set_learning_rate(lr)

Set the learning rate used by this model’s optimizer

stop_training()

Stop the training of the model

summary()

Show a summary of the model and its parameters.

TODO

TODO: though maybe this should be a method of module… model would have to add to it the observation dist

train_step(x_data, y_data)

Perform one training step

property trainable_variables

A list of trainable backend variables within this Module