Models¶
Models are objects which take Tensor(s) as input, perform some computation on those Tensor(s), and output probability distributions.
TODO: more…
- class probflow.models.Model(*args)[source]¶
Bases:
probflow.modules.module.Module
Abstract base class for probflow models.
TODO
- This class inherits several methods and properties from :class:`.Module`:
- \* :attr:`~parameters`
- \* :attr:`~modules`
- \* :attr:`~trainable_variables`
- \* :attr:`~n_parameters`
- \* :attr:`~n_variables`
- \* :meth:`~bayesian_update`
- \* :meth:`~kl_loss`
- \* :meth:`~kl_loss_batch`
- \* :meth:`~reset_kl_loss`
- \* :meth:`~add_kl_loss`
- \* :meth:`~dumps`
- \* :meth:`~save`
- and adds model-specific methods:
- \* :meth:`~log_likelihood`
- \* :meth:`~train_step`
- \* :meth:`~fit`
- \* :meth:`~stop_training`
- \* :meth:`~set_learning_rate`
- \* :meth:`~predictive_sample`
- \* :meth:`~aleatoric_sample`
- \* :meth:`~epistemic_sample`
- \* :meth:`~predict`
- \* :meth:`~metric`
- \* :meth:`~posterior_mean`
- \* :meth:`~posterior_sample`
- \* :meth:`~posterior_ci`
- \* :meth:`~prior_sample`
- \* :meth:`~posterior_plot`
- \* :meth:`~prior_plot`
- \* :meth:`~log_prob`
- \* :meth:`~log_prob_by`
- \* :meth:`~prob`
- \* :meth:`~prob_by`
- \* :meth:`~save`
- \* :meth:`~summary`
Example
See the user guide section on Models.
- log_likelihood(x_data, y_data)[source]¶
Compute the sum log likelihood of the model given a batch of data
- elbo_loss(x_data, y_data, n: int, n_mc: int)[source]¶
Compute the negative ELBO, scaled to a single sample.
- fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)[source]¶
Fit the model to data
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Default =
128
epochs (int) – Number of epochs to train the model. Default =
200
shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if
x
is aDataGenerator
Default =True
optimizer (
None
or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam
). When the backend is PyTorch the default is to use TODOoptimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.
lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (
batch_size
).flipout (bool) – Whether to use flipout during training where possible Default = True
num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If
None
, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if aDataGenerator
is passed asx
. Default = Nonecallbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is
[]
, i.e. no callbacks.eager (bool) – Whether to use eager execution. If False, will use
tf.function
(for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = Falsen_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.
Example
See the user guide section on Fitting a Model.
- predictive_sample(x=None, n=1000, batch_size=None)[source]¶
Draw samples from the posterior predictive distribution given x
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predictive distribution. Size (num_samples, x.shape[0], …)
- Return type
- aleatoric_sample(x=None, n=1000, batch_size=None)[source]¶
Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)
- Return type
- epistemic_sample(x=None, n=1000, batch_size=None)[source]¶
Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)
- Return type
- predict(x=None, method='mean', batch_size=None)[source]¶
Predict dependent variable using the model
TODO… using maximum a posteriori param estimates etc
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).method (str) – Method to use for prediction. If
'mean'
, uses the mean of the predicted target distribution as the prediction. If'mode'
, uses the mode of the distribution.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Predicted y-value for each sample in
x
. Of size (x.shape[0], y.shape[0], …, y.shape[-1])- Return type
Examples
TODO: Docs…
- metric(metric, x, y=None, batch_size=None)[source]¶
Compute a metric of model performance
TODO: docs
TODO: note that this doesn’t work w/ generative models
- Parameters
metric (str or callable) –
Metric to evaluate. Available metrics:
’lp’: log likelihood sum
’log_prob’: log likelihood sum
’accuracy’: accuracy
’acc’: accuracy
’mean_squared_error’: mean squared error
’mse’: mean squared error
’sum_squared_error’: sum squared error
’sse’: sum squared error
’mean_absolute_error’: mean absolute error
’mae’: mean absolute error
’r_squared’: coefficient of determination
’r2’: coefficient of determination
’recall’: true positive rate
’sensitivity’: true positive rate
’true_positive_rate’: true positive rate
’tpr’: true positive rate
’specificity’: true negative rate
’selectivity’: true negative rate
’true_negative_rate’: true negative rate
’tnr’: true negative rate
’precision’: precision
’f1_score’: F-measure
’f1’: F-measure
callable: a function which takes (y_true, y_pred)
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
to generate both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- posterior_mean(params=None)[source]¶
Get the mean of the posterior distribution(s)
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.
- Returns
Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain
ndarrays
with the posterior means. Thendarrays
are the same size as each parameter. Or just thendarray
ifparams
was a str.- Return type
- posterior_sample(params=None, n=10000)[source]¶
Draw samples from parameter posteriors
TODO: Docs… params is a list of strings of params to plot
- posterior_ci(params=None, ci=0.95, n=10000)[source]¶
Posterior confidence intervals
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.
ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95
n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000
- Returns
Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str
- Return type
- prior_sample(params=None, n=10000)[source]¶
Draw samples from parameter priors
TODO: Docs… params is a list of strings of params to plot
- posterior_plot(params=None, cols=1, **kwargs)[source]¶
Plot posterior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.posterior_plot()
- prior_plot(params=None, cols=1, **kwargs)[source]¶
Plot prior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.prior_plot()
- log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)[source]¶
Compute the log probability of y given the model
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns log probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns sum of all log probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns log probability posterior distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
log_probs – Log probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- prob(x, y=None, **kwargs)[source]¶
Compute the probability of
y
given the modelTODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns product of all probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns posterior probability distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
probs – Probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- summary()[source]¶
Show a summary of the model and its parameters.
TODO
TODO: though maybe this should be a method of module… model would have to add to it the observation dist
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- class probflow.models.ContinuousModel(*args)[source]¶
Bases:
probflow.models.model.Model
Abstract base class for probflow models where the dependent variable (the target) is continuous and 1-dimensional.
The only advantage to using this over the more general
Model
is thatContinuousModel
also includes several methods specific to continuous models, for tasks such as getting the predictive intervals, coverage, R-squared value, or calibration metrics (see below for the full list of methods).Only supports scalar dependent variables
Note that the methods of
ContinuousModel
only support scalar, continuous dependent variables (not any continuous model, as the name might suggest). For models which have a multidimensional output, just use the more generalModel
; for models with categorical output (i.e., classifiers), useCategoricalModel
; and for models which have a discrete output (e.g. a Poisson regression), useDiscreteModel
.This class inherits several methods from
Module
:as well as several methods from
Model
:and adds the following continuous-model-specific methods:
coefficient_of_variation()
Example
TODO
- predictive_interval(x, ci=0.95, side='both', n=1000, batch_size=None)[source]¶
Compute confidence intervals on the model’s estimate of the target given
x
, including all sources of uncertainty.TODO: docs
TODO: using side= both, upper, vs lower
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the posterior predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- aleatoric_interval(x, ci=0.95, side='both', n=1000, batch_size=None)[source]¶
Compute confidence intervals on the model’s estimate of the target given
x
, including only aleatoric uncertainty (uncertainty due to noise).TODO: docs
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the aleatoric predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- epistemic_interval(x, ci=0.95, side='both', n=1000, batch_size=None)[source]¶
Compute confidence intervals on the model’s estimate of the target given
x
, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values).TODO: docs
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the epistemic predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- pred_dist_plot(x, n=10000, cols=1, individually=False, batch_size=None, **kwargs)[source]¶
Plot posterior predictive distribution from the model given
x
.TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model given
x
. Default = 10000cols (int) – Divide the subplots into a grid with this many columns (if
individually=True
.individually (bool) – If
True
, plot one subplot per datapoint inx
, otherwise plot all the predictive distributions on the same plot.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
Example
TODO
- predictive_prc(x, y=None, n=1000, batch_size=None)[source]¶
Compute the percentile of each observation along the posterior predictive distribution.
TODO: Docs… Returns a percentile between 0 and 1
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
prcs
- Return type
ndarray
of float between 0 and 1
- pred_dist_covered(x, y=None, n: int = 1000, ci: float = 0.95, batch_size=None)[source]¶
Compute whether each observation was covered by a given confidence interval.
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000ci (float between 0 and 1) – Confidence interval to use.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- pred_dist_coverage(x, y=None, n=1000, ci=0.95, batch_size=None)[source]¶
Compute what percent of samples are covered by a given confidence interval.
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000ci (float between 0 and 1) – Confidence interval to use.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
prc_covered – Proportion of the samples which were covered by the predictive distribution’s confidence interval.
- Return type
float between 0 and 1
- coverage_by(x_by, x, y=None, n: int = 1000, ci: float = 0.95, bins: int = 30, plot: bool = True, ideal_line_kwargs: dict = {}, batch_size=None, **kwargs)[source]¶
Compute and plot the coverage of a given confidence interval of the posterior predictive distribution as a function of specified independent variables.
TODO: Docs…
- Parameters
x_by (int or str or list of int or list of str) – Which independent variable(s) to plot the log probability as a function of. That is, which columns in
x
to plot by.x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).ci (float between 0 and 1) – Inner percentile to find the coverage of. For example, if
ci=0.95
, will compute the coverage of the inner 95% of the posterior predictive distribution.bins (int) – Number of bins to use for x_by
ideal_line_kwargs (dict) – Dict of args to pass to matplotlib.pyplot.plot for ideal coverage line.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to plot_by
- Returns
xo (|ndarray|) – Values of x_by corresponding to bin centers.
co (|ndarray|) – Coverage of the
ci
confidence interval of the predictive distribution in each bin.
- r_squared(x, y=None, n=1000, batch_size=None)[source]¶
Compute the Bayesian R-squared distribution (Gelman et al., 2018).
TODO: more info
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the r-squared distribution. Size:
(num_samples,)
.- Return type
Examples
TODO: Docs…
References
Andrew Gelman, Ben Goodrich, Jonah Gabry, & Aki Vehtari. R-squared for Bayesian regression models. The American Statistician, 2018.
- r_squared_plot(x, y=None, n=1000, style='hist', batch_size=None, **kwargs)[source]¶
Plot the Bayesian R-squared distribution.
See
r_squared()
for more info on the Bayesian R-squared metric.- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of posterior draws to use for computing the r-squared distribution. Default = 1000.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
Example
TODO
- residuals(x, y=None, batch_size=None)[source]¶
Compute the residuals of the model’s predictions.
TODO: docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The residuals.
- Return type
Example
TODO
- residuals_plot(x, y=None, batch_size=None, **kwargs)[source]¶
Plot the distribution of residuals of the model’s predictions.
TODO: docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
Example
TODO
- calibration_curve(x, y, n=1000, resolution=100, batch_size=None)[source]¶
Compute the regression calibration curve (Kuleshov et al., 2018).
The regression calibration curve compares the empirical cumulative probability to the cumulative probability predicted by a regression model (Kuleshov et al., 2018). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:
\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:
\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).
The calibration curve then plots \(p\) against \(\hat{p}\).
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
p (|ndarray|) – The predicted cumulative frequencies, \(p\).
p_hat (|ndarray|) – The empirical cumulative frequencies, \(\hat{p}\).
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the calibration curve with
calibration_curve()
:p_pred, p_empirical = model.calibration_curve(x_val, y_val)
The returned values can be used directly or plotted against one another to get the calibration curve (as in Figure 3 in Kuleshov et al., 2018)
import matplotlib.pyplot as plt plt.plot(p_pred, p_empirical)
Or, even more simply, just use
calibration_curve_plot()
.See also
expected_calibration_error()
References
Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate Uncertainties for Deep Learning Using Calibrated Regression, 2018.
- calibration_curve_plot(x, y, n=1000, resolution=100, batch_size=None, **kwargs)[source]¶
Plot the regression calibration curve.
See
calibration_curve()
for more info about the regression calibration curve.- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
See also
expected_calibration_error()
- calibration_metric(metric, x, y=None, n=1000, resolution=100, batch_size=None)[source]¶
Compute one or more of several calibration metrics
Regression calibration metrics measure the error between a model’s regression calibration curve and the ideal calibration curve - i.e., what the curve would be if the model were perfectly calibrated (see Kuleshov et al., 2018 and Chung et al., 2020). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:
\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:
\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).
Various metrics can be computed from these curves to measure how accurately the regression model captures uncertainty:
The mean squared calibration error (MSCE) is the mean squared error between the empirical and predicted frequencies,
\[MSCE = \frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2\]The root mean squared calibration error (RMSCE) is just the square root of the MSCE:
\[RMSCE = \sqrt{\frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2}\]The mean absolute calibration error (MACE) is the mean of the absolute differences between the empirical and predicted frequencies:
\[MACE = \frac{1}{m} \sum_{j=1}^m | p_j - \hat{p}_j |\]And the miscalibration area (MA) is the area between the calibration curve and the ideal calibration curve (the identity line from (0, 0) to (1, 1):
\[MA = \int_0^1 p_x - \hat{p}_x dx\]Note that MA is equal to MACE as the number of bins (set by the
resolution
keyword argument) goes to infinity.To choose which metric to compute, pass the name of the metric (
msce
,rmsce
,mace
, orma
) as the first argument to this function (or a list of them to compute multiple).See Kuleshov et al., 2018, Chung et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using calibration metrics, among other metrics. Note that calibration is generally less important than accuracy, but more important than other metrics like
sharpness()
and anydispersion_metric()
.- Parameters
metric (str {'msce', 'rmsce', 'mace', or 'ma'} or List[str]) –
Which metric(s) to compute (see above for the definition of each metric). To compute multiple metrics, pass a list of the metric names you’d like to compute. Available metrics are:
msce
: mean squared calibration errorrmsce
: root mean squared calibration errormace
: mean absolute calibration errorma
: miscalibration area
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The requested calibration metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute different calibration metrics using
expected_calibration_error()
. For example, to compute the mean squared calibration error (MSCE):>>> model.calibration_metric("msce", x_val, y_val) 0.123
Or, to compute the mean absolute calibration error (MACE):
>>> model.calibration_metric("mace", x_val, y_val) 0.211
To compute multiple metrics at the same time, pass a list of metric names:
>>> model.calibration_metric(["msce", "mace"], x_val, y_val) {"msce": 0.123, "mace": 0.211}
References
Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate Uncertainties for Deep Learning Using Calibrated Regression, 2018.
Youngseog Chung, Willie Neiswanger, Ian Char, Jeff Schneider. Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification, 2020.
- sharpness(x, n=1000, batch_size=None)[source]¶
Compute the sharpness of the model’s uncertainty estimates
The “sharpness” of a model’s uncertainty estimates is the root mean of the estimated variances:
\[SHA = \sqrt{\frac{1}{N} \sum_{i=1}^N \text{Var}(\hat{Y}_i)}\]See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using sharpness, among other metrics. Note that the sharpness should generally be one of the later things you consider - accuracy and calibration usually being more important.
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.n (int) – Number of samples to draw from the model. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The sharpness of the model’s uncertainty estimates
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the sharpness of our model’s predictions with:
>>> model.sharpness(x_val) 0.173
See also
References
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, Zachary W. Ulissi. Methods for comparing uncertainty quantifications for material property predictions, 2020.
- dispersion_metric(metric, x, n=1000, batch_size=None)[source]¶
Compute one or more of several calibration metrics
Dispersion metrics measure how much a model’s uncertainty estimates vary. There are several different dispersion metrics:
The coefficient of variation (\(C_v\)) is the ratio of the standard deviation to the mean (of the model’s uncertainty standard deviations):
\[C_v =\]The quartile coefficient of dispersion (\(QCD\)) is less sensitive to outliers, as it simply measures the difference between the first and third quartile (of the model’s uncertainty standard deviations) to their sum:
\[QCD = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using dispersion metrics, among other metrics. Note that dispersion metrics should generally be one of the last things you consider - accuracy, calibration, and sharpness usually being more important.
- Parameters
metric (str {'cv' or 'qcd'} or List[str]) – Dispersion metric to compute. Or,
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.n (int) – Number of samples to draw from the model. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The requested dispersion metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the coefficient of variation of our model’s predictions with:
>>> model.dispersion_metric('cv', x_val) 0.732
Or the quartile coefficient of dispersion with:
>>> model.dispersion_metric('qcd', x_val) 0.625
See also
References
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, Zachary W. Ulissi. Methods for comparing uncertainty quantifications for material property predictions, 2020.
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- aleatoric_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)
- Return type
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- epistemic_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)
- Return type
- fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)¶
Fit the model to data
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Default =
128
epochs (int) – Number of epochs to train the model. Default =
200
shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if
x
is aDataGenerator
Default =True
optimizer (
None
or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam
). When the backend is PyTorch the default is to use TODOoptimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.
lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (
batch_size
).flipout (bool) – Whether to use flipout during training where possible Default = True
num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If
None
, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if aDataGenerator
is passed asx
. Default = Nonecallbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is
[]
, i.e. no callbacks.eager (bool) – Whether to use eager execution. If False, will use
tf.function
(for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = Falsen_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.
Example
See the user guide section on Fitting a Model.
- get_elbo()¶
Get the current ELBO on training data
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- log_likelihood(x_data, y_data)¶
Compute the sum log likelihood of the model given a batch of data
- log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)¶
Compute the log probability of y given the model
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns log probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns sum of all log probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns log probability posterior distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
log_probs – Log probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- metric(metric, x, y=None, batch_size=None)¶
Compute a metric of model performance
TODO: docs
TODO: note that this doesn’t work w/ generative models
- Parameters
metric (str or callable) –
Metric to evaluate. Available metrics:
’lp’: log likelihood sum
’log_prob’: log likelihood sum
’accuracy’: accuracy
’acc’: accuracy
’mean_squared_error’: mean squared error
’mse’: mean squared error
’sum_squared_error’: sum squared error
’sse’: sum squared error
’mean_absolute_error’: mean absolute error
’mae’: mean absolute error
’r_squared’: coefficient of determination
’r2’: coefficient of determination
’recall’: true positive rate
’sensitivity’: true positive rate
’true_positive_rate’: true positive rate
’tpr’: true positive rate
’specificity’: true negative rate
’selectivity’: true negative rate
’true_negative_rate’: true negative rate
’tnr’: true negative rate
’precision’: precision
’f1_score’: F-measure
’f1’: F-measure
callable: a function which takes (y_true, y_pred)
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
to generate both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- posterior_ci(params=None, ci=0.95, n=10000)¶
Posterior confidence intervals
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.
ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95
n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000
- Returns
Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str
- Return type
- posterior_mean(params=None)¶
Get the mean of the posterior distribution(s)
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.
- Returns
Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain
ndarrays
with the posterior means. Thendarrays
are the same size as each parameter. Or just thendarray
ifparams
was a str.- Return type
- posterior_plot(params=None, cols=1, **kwargs)¶
Plot posterior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.posterior_plot()
- posterior_sample(params=None, n=10000)¶
Draw samples from parameter posteriors
TODO: Docs… params is a list of strings of params to plot
- predict(x=None, method='mean', batch_size=None)¶
Predict dependent variable using the model
TODO… using maximum a posteriori param estimates etc
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).method (str) – Method to use for prediction. If
'mean'
, uses the mean of the predicted target distribution as the prediction. If'mode'
, uses the mode of the distribution.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Predicted y-value for each sample in
x
. Of size (x.shape[0], y.shape[0], …, y.shape[-1])- Return type
Examples
TODO: Docs…
- predictive_sample(x=None, n=1000, batch_size=None)¶
Draw samples from the posterior predictive distribution given x
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predictive distribution. Size (num_samples, x.shape[0], …)
- Return type
- prior_plot(params=None, cols=1, **kwargs)¶
Plot prior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.prior_plot()
- prior_sample(params=None, n=10000)¶
Draw samples from parameter priors
TODO: Docs… params is a list of strings of params to plot
- prob(x, y=None, **kwargs)¶
Compute the probability of
y
given the modelTODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns product of all probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns posterior probability distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
probs – Probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- save(filename: str)¶
Save module object to file
- Parameters
filename (str) – Filename for file to which to save this object
- set_kl_weight(w)¶
Set the weight of the KL term’s contribution to the ELBO loss
- set_learning_rate(lr)¶
Set the learning rate used by this model’s optimizer
- stop_training()¶
Stop the training of the model
- summary()¶
Show a summary of the model and its parameters.
TODO
TODO: though maybe this should be a method of module… model would have to add to it the observation dist
- train_step(x_data, y_data)¶
Perform one training step
- class probflow.models.DiscreteModel(*args)[source]¶
Bases:
probflow.models.continuous_model.ContinuousModel
Abstract base class for probflow models where the dependent variable (the target) is discrete (e.g. drawn from a Poisson distribution).
TODO : why use this over just Model
This class inherits several methods from
Module
:as well as several methods from
Model
:as well as several methods from
ContinuousModel
:but overrides the following discrete-model-specific methods:
Note that
DiscreteModel
does not implementr_squared()
orr_squared_plot()
.Example
TODO
- pred_dist_plot(x, n=10000, cols=1, **kwargs)[source]¶
Plot posterior predictive distribution from the model given
x
.TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model given
x
. Default = 10000cols (int) – Divide the subplots into a grid with this many columns (if
individually=True
.**kwargs – Additional keyword arguments are passed to
plot_discrete_dist()
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- aleatoric_interval(x, ci=0.95, side='both', n=1000, batch_size=None)¶
Compute confidence intervals on the model’s estimate of the target given
x
, including only aleatoric uncertainty (uncertainty due to noise).TODO: docs
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the aleatoric predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- aleatoric_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)
- Return type
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- calibration_curve(x, y, n=1000, resolution=100, batch_size=None)¶
Compute the regression calibration curve (Kuleshov et al., 2018).
The regression calibration curve compares the empirical cumulative probability to the cumulative probability predicted by a regression model (Kuleshov et al., 2018). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:
\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:
\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).
The calibration curve then plots \(p\) against \(\hat{p}\).
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
p (|ndarray|) – The predicted cumulative frequencies, \(p\).
p_hat (|ndarray|) – The empirical cumulative frequencies, \(\hat{p}\).
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the calibration curve with
calibration_curve()
:p_pred, p_empirical = model.calibration_curve(x_val, y_val)
The returned values can be used directly or plotted against one another to get the calibration curve (as in Figure 3 in Kuleshov et al., 2018)
import matplotlib.pyplot as plt plt.plot(p_pred, p_empirical)
Or, even more simply, just use
calibration_curve_plot()
.See also
expected_calibration_error()
References
Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate Uncertainties for Deep Learning Using Calibrated Regression, 2018.
- calibration_curve_plot(x, y, n=1000, resolution=100, batch_size=None, **kwargs)¶
Plot the regression calibration curve.
See
calibration_curve()
for more info about the regression calibration curve.- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
See also
expected_calibration_error()
- calibration_metric(metric, x, y=None, n=1000, resolution=100, batch_size=None)¶
Compute one or more of several calibration metrics
Regression calibration metrics measure the error between a model’s regression calibration curve and the ideal calibration curve - i.e., what the curve would be if the model were perfectly calibrated (see Kuleshov et al., 2018 and Chung et al., 2020). First, a vector \(p\) of \(m\) confidence levels are chosen, which correspond to the predicted cumulative probabilities:
\[0 \leq p_1 \leq p_2 \leq \ldots \leq p_m \leq 1\]Then, a vector of empirical frequencies \(\hat{p}\) at each of the predicted frequencies is computed by using validation data:
\[\hat{p}_j = \frac{1}{N} \sum_{i=1}^N [ P_M(x_i \leq y_i) \leq p_j ]\]where \(N\) is the number of validation datapoints, \(P_M(x_i \leq y_i)\) is the model’s predicted cumulative probability of datapoint \(i\) (i.e., the percentile along the model’s predicted probability distribution at which the true value of \(y_i\) falls), and \(\sum_i [ a_i \leq b_i ]\) is just the count of elements of \(a\) which are less than corresponding elements in \(b\).
Various metrics can be computed from these curves to measure how accurately the regression model captures uncertainty:
The mean squared calibration error (MSCE) is the mean squared error between the empirical and predicted frequencies,
\[MSCE = \frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2\]The root mean squared calibration error (RMSCE) is just the square root of the MSCE:
\[RMSCE = \sqrt{\frac{1}{m} \sum_{j=1}^m (p_j - \hat{p}_j)^2}\]The mean absolute calibration error (MACE) is the mean of the absolute differences between the empirical and predicted frequencies:
\[MACE = \frac{1}{m} \sum_{j=1}^m | p_j - \hat{p}_j |\]And the miscalibration area (MA) is the area between the calibration curve and the ideal calibration curve (the identity line from (0, 0) to (1, 1):
\[MA = \int_0^1 p_x - \hat{p}_x dx\]Note that MA is equal to MACE as the number of bins (set by the
resolution
keyword argument) goes to infinity.To choose which metric to compute, pass the name of the metric (
msce
,rmsce
,mace
, orma
) as the first argument to this function (or a list of them to compute multiple).See Kuleshov et al., 2018, Chung et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using calibration metrics, among other metrics. Note that calibration is generally less important than accuracy, but more important than other metrics like
sharpness()
and anydispersion_metric()
.- Parameters
metric (str {'msce', 'rmsce', 'mace', or 'ma'} or List[str]) –
Which metric(s) to compute (see above for the definition of each metric). To compute multiple metrics, pass a list of the metric names you’d like to compute. Available metrics are:
msce
: mean squared calibration errorrmsce
: root mean squared calibration errormace
: mean absolute calibration errorma
: miscalibration area
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model for computing the predictive percentile. Default = 1000
resolution (int) – Number of confidence levels to evaluate at. This corresponds to the \(m\) parameter in section 3.5 of (Kuleshov et al., 2018).
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The requested calibration metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute different calibration metrics using
expected_calibration_error()
. For example, to compute the mean squared calibration error (MSCE):>>> model.calibration_metric("msce", x_val, y_val) 0.123
Or, to compute the mean absolute calibration error (MACE):
>>> model.calibration_metric("mace", x_val, y_val) 0.211
To compute multiple metrics at the same time, pass a list of metric names:
>>> model.calibration_metric(["msce", "mace"], x_val, y_val) {"msce": 0.123, "mace": 0.211}
References
Volodymyr Kuleshov, Nathan Fenner, and Stefano Ermon. Accurate Uncertainties for Deep Learning Using Calibrated Regression, 2018.
Youngseog Chung, Willie Neiswanger, Ian Char, Jeff Schneider. Beyond Pinball Loss: Quantile Methods for Calibrated Uncertainty Quantification, 2020.
- coverage_by(x_by, x, y=None, n: int = 1000, ci: float = 0.95, bins: int = 30, plot: bool = True, ideal_line_kwargs: dict = {}, batch_size=None, **kwargs)¶
Compute and plot the coverage of a given confidence interval of the posterior predictive distribution as a function of specified independent variables.
TODO: Docs…
- Parameters
x_by (int or str or list of int or list of str) – Which independent variable(s) to plot the log probability as a function of. That is, which columns in
x
to plot by.x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).ci (float between 0 and 1) – Inner percentile to find the coverage of. For example, if
ci=0.95
, will compute the coverage of the inner 95% of the posterior predictive distribution.bins (int) – Number of bins to use for x_by
ideal_line_kwargs (dict) – Dict of args to pass to matplotlib.pyplot.plot for ideal coverage line.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to plot_by
- Returns
xo (|ndarray|) – Values of x_by corresponding to bin centers.
co (|ndarray|) – Coverage of the
ci
confidence interval of the predictive distribution in each bin.
- dispersion_metric(metric, x, n=1000, batch_size=None)¶
Compute one or more of several calibration metrics
Dispersion metrics measure how much a model’s uncertainty estimates vary. There are several different dispersion metrics:
The coefficient of variation (\(C_v\)) is the ratio of the standard deviation to the mean (of the model’s uncertainty standard deviations):
\[C_v =\]The quartile coefficient of dispersion (\(QCD\)) is less sensitive to outliers, as it simply measures the difference between the first and third quartile (of the model’s uncertainty standard deviations) to their sum:
\[QCD = \frac{Q_3 - Q_1}{Q_3 + Q_1}\]See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using dispersion metrics, among other metrics. Note that dispersion metrics should generally be one of the last things you consider - accuracy, calibration, and sharpness usually being more important.
- Parameters
metric (str {'cv' or 'qcd'} or List[str]) – Dispersion metric to compute. Or,
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.n (int) – Number of samples to draw from the model. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The requested dispersion metric. If a list of metric names was passed, will return a dict whose keys are the metrics, and whose values are the corresponding metric values.
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the coefficient of variation of our model’s predictions with:
>>> model.dispersion_metric('cv', x_val) 0.732
Or the quartile coefficient of dispersion with:
>>> model.dispersion_metric('qcd', x_val) 0.625
See also
References
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, Zachary W. Ulissi. Methods for comparing uncertainty quantifications for material property predictions, 2020.
- dumps()¶
Serialize module object to bytes
- epistemic_interval(x, ci=0.95, side='both', n=1000, batch_size=None)¶
Compute confidence intervals on the model’s estimate of the target given
x
, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values).TODO: docs
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the epistemic predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- epistemic_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)
- Return type
- fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)¶
Fit the model to data
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Default =
128
epochs (int) – Number of epochs to train the model. Default =
200
shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if
x
is aDataGenerator
Default =True
optimizer (
None
or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam
). When the backend is PyTorch the default is to use TODOoptimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.
lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (
batch_size
).flipout (bool) – Whether to use flipout during training where possible Default = True
num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If
None
, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if aDataGenerator
is passed asx
. Default = Nonecallbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is
[]
, i.e. no callbacks.eager (bool) – Whether to use eager execution. If False, will use
tf.function
(for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = Falsen_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.
Example
See the user guide section on Fitting a Model.
- get_elbo()¶
Get the current ELBO on training data
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- log_likelihood(x_data, y_data)¶
Compute the sum log likelihood of the model given a batch of data
- log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)¶
Compute the log probability of y given the model
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns log probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns sum of all log probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns log probability posterior distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
log_probs – Log probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- metric(metric, x, y=None, batch_size=None)¶
Compute a metric of model performance
TODO: docs
TODO: note that this doesn’t work w/ generative models
- Parameters
metric (str or callable) –
Metric to evaluate. Available metrics:
’lp’: log likelihood sum
’log_prob’: log likelihood sum
’accuracy’: accuracy
’acc’: accuracy
’mean_squared_error’: mean squared error
’mse’: mean squared error
’sum_squared_error’: sum squared error
’sse’: sum squared error
’mean_absolute_error’: mean absolute error
’mae’: mean absolute error
’r_squared’: coefficient of determination
’r2’: coefficient of determination
’recall’: true positive rate
’sensitivity’: true positive rate
’true_positive_rate’: true positive rate
’tpr’: true positive rate
’specificity’: true negative rate
’selectivity’: true negative rate
’true_negative_rate’: true negative rate
’tnr’: true negative rate
’precision’: precision
’f1_score’: F-measure
’f1’: F-measure
callable: a function which takes (y_true, y_pred)
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
to generate both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- posterior_ci(params=None, ci=0.95, n=10000)¶
Posterior confidence intervals
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.
ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95
n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000
- Returns
Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str
- Return type
- posterior_mean(params=None)¶
Get the mean of the posterior distribution(s)
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.
- Returns
Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain
ndarrays
with the posterior means. Thendarrays
are the same size as each parameter. Or just thendarray
ifparams
was a str.- Return type
- posterior_plot(params=None, cols=1, **kwargs)¶
Plot posterior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.posterior_plot()
- posterior_sample(params=None, n=10000)¶
Draw samples from parameter posteriors
TODO: Docs… params is a list of strings of params to plot
- pred_dist_coverage(x, y=None, n=1000, ci=0.95, batch_size=None)¶
Compute what percent of samples are covered by a given confidence interval.
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000ci (float between 0 and 1) – Confidence interval to use.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
prc_covered – Proportion of the samples which were covered by the predictive distribution’s confidence interval.
- Return type
float between 0 and 1
- pred_dist_covered(x, y=None, n: int = 1000, ci: float = 0.95, batch_size=None)¶
Compute whether each observation was covered by a given confidence interval.
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000ci (float between 0 and 1) – Confidence interval to use.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- predict(x=None, method='mean', batch_size=None)¶
Predict dependent variable using the model
TODO… using maximum a posteriori param estimates etc
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).method (str) – Method to use for prediction. If
'mean'
, uses the mean of the predicted target distribution as the prediction. If'mode'
, uses the mode of the distribution.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Predicted y-value for each sample in
x
. Of size (x.shape[0], y.shape[0], …, y.shape[-1])- Return type
Examples
TODO: Docs…
- predictive_interval(x, ci=0.95, side='both', n=1000, batch_size=None)¶
Compute confidence intervals on the model’s estimate of the target given
x
, including all sources of uncertainty.TODO: docs
TODO: using side= both, upper, vs lower
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).ci (float between 0 and 1) – Inner proportion of predictive distribution to use a the confidence interval. Default = 0.95
side (str {'lower', 'upper', 'both'}) – Whether to get the one- or two-sided interval, and which side to get. If
'both'
(default), gets the upper and lower bounds of the centralci
interval. If'lower'
, gets the lower bound on the one-sidedci
interval. If'upper'
, gets the upper bound on the one-sidedci
interval.n (int) – Number of samples from the posterior predictive distribution to take to compute the confidence intervals. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
lb (|ndarray|) – Lower bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='upper'
.ub (|ndarray|) – Upper bounds of the
ci
confidence intervals on the predictions for samples inx
. Doesn’t return this ifside='lower'
.
- predictive_prc(x, y=None, n=1000, batch_size=None)¶
Compute the percentile of each observation along the posterior predictive distribution.
TODO: Docs… Returns a percentile between 0 and 1
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).n (int) – Number of samples to draw from the model given
x
. Default = 1000batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
prcs
- Return type
ndarray
of float between 0 and 1
- predictive_sample(x=None, n=1000, batch_size=None)¶
Draw samples from the posterior predictive distribution given x
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predictive distribution. Size (num_samples, x.shape[0], …)
- Return type
- prior_plot(params=None, cols=1, **kwargs)¶
Plot prior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.prior_plot()
- prior_sample(params=None, n=10000)¶
Draw samples from parameter priors
TODO: Docs… params is a list of strings of params to plot
- prob(x, y=None, **kwargs)¶
Compute the probability of
y
given the modelTODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns product of all probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns posterior probability distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
probs – Probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- residuals(x, y=None, batch_size=None)¶
Compute the residuals of the model’s predictions.
TODO: docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The residuals.
- Return type
Example
TODO
- residuals_plot(x, y=None, batch_size=None, **kwargs)¶
Plot the distribution of residuals of the model’s predictions.
TODO: docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_dist()
Example
TODO
- save(filename: str)¶
Save module object to file
- Parameters
filename (str) – Filename for file to which to save this object
- set_kl_weight(w)¶
Set the weight of the KL term’s contribution to the ELBO loss
- set_learning_rate(lr)¶
Set the learning rate used by this model’s optimizer
- sharpness(x, n=1000, batch_size=None)¶
Compute the sharpness of the model’s uncertainty estimates
The “sharpness” of a model’s uncertainty estimates is the root mean of the estimated variances:
\[SHA = \sqrt{\frac{1}{N} \sum_{i=1}^N \text{Var}(\hat{Y}_i)}\]See Tran et al., 2020 and the user guide page on Evaluating Model Performance for discussions of evaluating uncertainty estimates using sharpness, among other metrics. Note that the sharpness should generally be one of the later things you consider - accuracy and calibration usually being more important.
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.n (int) – Number of samples to draw from the model. Default = 1000
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
The sharpness of the model’s uncertainty estimates
- Return type
Example
Supposing we have some training data (
x_train
andy_train
) and validation data (x_val
andy_val
), and have already fit a model to the training data,model = # some ProbFlow model... model.fit(x_train, y_train)
Then we can compute the sharpness of our model’s predictions with:
>>> model.sharpness(x_val) 0.173
See also
References
Kevin Tran, Willie Neiswanger, Junwoong Yoon, Qingyang Zhang, Eric Xing, Zachary W. Ulissi. Methods for comparing uncertainty quantifications for material property predictions, 2020.
- stop_training()¶
Stop the training of the model
- summary()¶
Show a summary of the model and its parameters.
TODO
TODO: though maybe this should be a method of module… model would have to add to it the observation dist
- train_step(x_data, y_data)¶
Perform one training step
- class probflow.models.CategoricalModel(*args)[source]¶
Bases:
probflow.models.model.Model
Abstract base class for probflow models where the dependent variable (the target) is categorical (e.g. drawn from a Bernoulli distribution).
TODO : why use this over just Model
This class inherits several methods from
Module
:as well as several methods from
Model
:and adds the following categorical-model-specific methods:
Example
TODO
- pred_dist_plot(x, n=10000, cols=1, batch_size=None, **kwargs)[source]¶
Plot posterior predictive distribution from the model given
x
.TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model given
x
. Default = 10000cols (int) – Divide the subplots into a grid with this many columns (if
individually=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
**kwargs – Additional keyword arguments are passed to
plot_categorical_dist()
- calibration_curve(x, y=None, split_by=None, bins=10, plot=True, batch_size=None)[source]¶
Plot and return the categorical calibration curve.
Plots and returns the calibration curve (estimated probability of outcome vs the true probability of that outcome).
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).split_by (int) – Draw the calibration curve independently for datapoints with each unique value in x[:,split_by] (a categorical column).
bins (int, list of float, or
ndarray
) – Bins used to compute the curve. If an integer, will use bins evenly-spaced bins from 0 to 1. If a vector, bins is the vector of bin edges.plot (bool) – Whether to plot the curve
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
#TODO (split by continuous cols as well? Then will need to define bins or edges too) –
TODO (Docs...) –
- add_kl_loss(loss, d2=None)¶
Add additional loss due to KL divergences.
- aleatoric_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only aleatoric uncertainty (uncertainty due to noise)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples,x.shape[0],…)
- Return type
- bayesian_update()¶
Perform a Bayesian update of all Parameters in this module. Sets the prior to the current variational posterior for all parameters.
- dumps()¶
Serialize module object to bytes
- epistemic_sample(x=None, n=1000, batch_size=None)¶
Draw samples of the model’s estimate given x, including only epistemic uncertainty (uncertainty due to uncertainty as to the model’s parameter values)
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predicted mean distribution. Size (num_samples, x.shape[0], …)
- Return type
- fit(x, y=None, batch_size: int = 128, epochs: int = 200, shuffle: bool = False, optimizer=None, optimizer_kwargs: dict = {}, lr: Optional[float] = None, flipout: bool = True, num_workers: Optional[int] = None, callbacks: List[probflow.utils.base.BaseCallback] = [], eager: bool = False, n_mc: int = 1)¶
Fit the model to data
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Default =
128
epochs (int) – Number of epochs to train the model. Default =
200
shuffle (bool) – Whether to shuffle the data each epoch. Note that this is ignored if
x
is aDataGenerator
Default =True
optimizer (
None
or a backend-specific optimizer) – What optimizer to use for optimizing the variational posterior distributions’ variables. When the backend is TensorFlow the default is to use adam (tf.keras.optimizers.Adam
). When the backend is PyTorch the default is to use TODOoptimizer_kwargs (dict) – Keyword arguments to pass to the optimizer. Default is an empty dict.
lr (float) – Learning rate for the optimizer. Note that the learning rate can be updated during training using the set_learning_rate method. Default is \(\exp (- \log_{10} (N_p N_b))\), where \(N_p\) is the number of parameters in the model, and \(N_b\) is the number of samples per batch (
batch_size
).flipout (bool) – Whether to use flipout during training where possible Default = True
num_workers (None or int > 0) – Number of parallel processes to run for loading the data. If
None
, will not use parallel processes. If an integer, will use a process pool with that many processes. Note that this parameter is ignored if aDataGenerator
is passed asx
. Default = Nonecallbacks (List[BaseCallback]) – List of callbacks to run while training the model. Default is
[]
, i.e. no callbacks.eager (bool) – Whether to use eager execution. If False, will use
tf.function
(for TensorFlow) or tracing (for PyTorch) to optimize the model fitting. Note that even if eager=True, you can still use eager execution when using the model after it is fit. Default = Falsen_mc (int) – Number of monte carlo samples to take from the variational posteriors per minibatch. The default is to just take one per batch. Using a smaller number of MC samples is faster, but using a greater number of MC samples will decrease the variance of the gradients, leading to more stable parameter optimization.
Example
See the user guide section on Fitting a Model.
- get_elbo()¶
Get the current ELBO on training data
- kl_loss()¶
Compute the sum of the Kullback-Leibler divergences between priors and their variational posteriors for all Parameters in this Module and its sub-Modules.
- kl_loss_batch()¶
Compute the sum of additional Kullback-Leibler divergences due to data in this batch
- log_likelihood(x_data, y_data)¶
Compute the sum log likelihood of the model given a batch of data
- log_prob(x, y=None, individually=True, distribution=False, n=1000, batch_size=None)¶
Compute the log probability of y given the model
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor) – Independent variable values of the dataset to evaluate (aka the “features”).y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns log probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns sum of all log probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns log probability posterior distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns log posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
log_probs – Log probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- metric(metric, x, y=None, batch_size=None)¶
Compute a metric of model performance
TODO: docs
TODO: note that this doesn’t work w/ generative models
- Parameters
metric (str or callable) –
Metric to evaluate. Available metrics:
’lp’: log likelihood sum
’log_prob’: log likelihood sum
’accuracy’: accuracy
’acc’: accuracy
’mean_squared_error’: mean squared error
’mse’: mean squared error
’sum_squared_error’: sum squared error
’sse’: sum squared error
’mean_absolute_error’: mean absolute error
’mae’: mean absolute error
’r_squared’: coefficient of determination
’r2’: coefficient of determination
’recall’: true positive rate
’sensitivity’: true positive rate
’true_positive_rate’: true positive rate
’tpr’: true positive rate
’specificity’: true negative rate
’selectivity’: true negative rate
’true_negative_rate’: true negative rate
’tnr’: true negative rate
’precision’: precision
’f1_score’: F-measure
’f1’: F-measure
callable: a function which takes (y_true, y_pred)
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
to generate both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
- Return type
TODO
- property n_parameters¶
Get the number of independent parameters of this module
- property n_variables¶
Get the number of underlying variables in this module
- property parameters¶
A list of Parameters in this Module and its sub-Modules.
- posterior_ci(params=None, ci=0.95, n=10000)¶
Posterior confidence intervals
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) to sample. Default is to get the confidence intervals for all parameters in the model.
ci (float) – Confidence interval for which to compute the upper and lower bounds. Must be between 0 and 1. Default = 0.95
n (int) – Number of samples to draw from the posterior distributions for computing the confidence intervals Default = 10,000
- Returns
Confidence intervals of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain tuples. The first element of each tuple is the lower bound, and the second element is the upper bound. Or just a single tuple if params was a str
- Return type
- posterior_mean(params=None)¶
Get the mean of the posterior distribution(s)
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or List[str] or None) – Parameter name(s) for which to compute the means. Default is to get the mean for all parameters in the model.
- Returns
Means of the parameter posterior distributions. A dictionary where the keys contain the parameter names and the values contain
ndarrays
with the posterior means. Thendarrays
are the same size as each parameter. Or just thendarray
ifparams
was a str.- Return type
- posterior_plot(params=None, cols=1, **kwargs)¶
Plot posterior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the posterior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.posterior_plot()
- posterior_sample(params=None, n=10000)¶
Draw samples from parameter posteriors
TODO: Docs… params is a list of strings of params to plot
- predict(x=None, method='mean', batch_size=None)¶
Predict dependent variable using the model
TODO… using maximum a posteriori param estimates etc
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).method (str) – Method to use for prediction. If
'mean'
, uses the mean of the predicted target distribution as the prediction. If'mode'
, uses the mode of the distribution.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Predicted y-value for each sample in
x
. Of size (x.shape[0], y.shape[0], …, y.shape[-1])- Return type
Examples
TODO: Docs…
- predictive_sample(x=None, n=1000, batch_size=None)¶
Draw samples from the posterior predictive distribution given x
TODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”).n (int) – Number of samples to draw from the model per datapoint.
batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
Samples from the predictive distribution. Size (num_samples, x.shape[0], …)
- Return type
- prior_plot(params=None, cols=1, **kwargs)¶
Plot prior distributions of the model’s parameters
TODO: Docs… params is a list of strings of params to plot
- Parameters
params (str or list or None) – List of names of parameters to plot. Default is to plot the prior of all parameters in the model.
cols (int) – Divide the subplots into a grid with this many columns.
kwargs – Additional keyword arguments are passed to
Parameter.prior_plot()
- prior_sample(params=None, n=10000)¶
Draw samples from parameter priors
TODO: Docs… params is a list of strings of params to plot
- prob(x, y=None, **kwargs)¶
Compute the probability of
y
given the modelTODO: Docs…
- Parameters
x (
ndarray
orDataFrame
orSeries
or Tensor orDataGenerator
) – Independent variable values of the dataset to evaluate (aka the “features”). Or aDataGenerator
for both x and y.y (
ndarray
orDataFrame
orSeries
or Tensor) – Dependent variable values of the dataset to evaluate (aka the “target”).individually (bool) – If
individually
is True, returns probability for each sample individually, so return shape is(x.shape[0], ?)
. Ifindividually
is False, returns product of all probabilities, so return shape is(1, ?)
.distribution (bool) – If
distribution
is True, returns posterior probability distribution (n
samples from the model), so return shape is(?, n)
. Ifdistribution
is False, returns posterior probabilities using the maximum a posteriori estimate for each parameter, so the return shape is(?, 1)
.n (int) – Number of samples to draw for each distribution if
distribution=True
.batch_size (None or int) – Compute using batches of this many datapoints. Default is None (i.e., do not use batching).
- Returns
probs – Probabilities. Shape is determined by
individually
,distribution
, andn
kwargs.- Return type
- reset_kl_loss()¶
Reset additional loss due to KL divergences
- save(filename: str)¶
Save module object to file
- Parameters
filename (str) – Filename for file to which to save this object
- set_kl_weight(w)¶
Set the weight of the KL term’s contribution to the ELBO loss
- set_learning_rate(lr)¶
Set the learning rate used by this model’s optimizer
- stop_training()¶
Stop the training of the model
- summary()¶
Show a summary of the model and its parameters.
TODO
TODO: though maybe this should be a method of module… model would have to add to it the observation dist
- train_step(x_data, y_data)¶
Perform one training step