Utils¶
The utils
module contains utility classes, functions, and settings
which ProbFlow uses internally. The sub-modules of utils
are:
settings
- backend, datatype, and sampling settingsbase
- abstract base classes for ProbFlow objectsops
- backend-independent mathematical operationscasting
- backend-independent casting operationsinitializers
- backend-independent variable initializer functionsio
- functions for loading and saving modelsmetrics
- functions for computing various model performance metricsplotting
- functions for plotting distributions, posteriors, etctorch_distributions
- manual implementations of missing torch distsvalidation
- functions for data type validation
Settings¶
The utils.settings module contains global settings about the backend to use, what sampling method to use, the default device, and default datatype.
Datatype¶
Which datatype to use as the default for parameters. Depending on your model, you might have to set the default datatype to match the datatype of your data.
Samples¶
Whether and how many samples to draw from parameter posterior distributions.
If None
, the maximum a posteriori estimate of each parameter will be used.
If an integer greater than 0, that many samples from each parameter’s posterior
distribution will be used.
Static posterior sampling¶
Whether or not to use static posterior sampling (i.e., take a random sample from the posterior, but take the same random sample on repeated calls), and control the UUID of the current static sampling regime.
Sampling context manager¶
A context manager which controls how Parameters sample from their variational distributions while inside the context manager.
- probflow.utils.settings.get_backend()[source]¶
Get which backend is currently being used.
- Returns
backend – The current backend
- Return type
str {‘tensorflow’ or ‘pytorch’}
- probflow.utils.settings.set_backend(backend)[source]¶
Set which backend is currently being used.
- Parameters
backend (str {'tensorflow' or 'pytorch'}) – The backend to use
- probflow.utils.settings.get_datatype()[source]¶
Get the default datatype used for Tensors
- Returns
dtype – The current default datatype
- Return type
tf.dtype or torch.dtype
- probflow.utils.settings.set_datatype(datatype)[source]¶
Set the datatype to use for Tensors
- Parameters
datatype (tf.dtype or torch.dtype) – The default datatype to use
- probflow.utils.settings.get_samples()[source]¶
Get how many samples (if any) are being drawn from parameter posteriors
- Returns
n – Number of samples (if any) to draw from parameters’ posteriors. Default = None (ie, use the Maximum a posteriori estimate)
- Return type
None or int > 0
- probflow.utils.settings.set_samples(samples)[source]¶
Set how many samples (if any) to draw from parameter posteriors
- Parameters
samples (None or int > 0) – Number of samples (if any) to draw from parameters’ posteriors.
- probflow.utils.settings.get_flipout()[source]¶
Get whether flipout is currently being used where possible.
- Returns
flipout – Whether flipout is currently being used where possible while sampling during training.
- Return type
- probflow.utils.settings.set_flipout(flipout)[source]¶
Set whether to use flipout where possible while sampling during training
- Parameters
flipout (bool) – Whether to use flipout where possible while sampling during training.
- probflow.utils.settings.set_static_sampling_uuid(uuid_value)[source]¶
Set the current static sampling UUID
- class probflow.utils.settings.Sampling(n=None, flipout=None, static=None)[source]¶
Bases:
object
Use sampling while within this context manager.
- Keyword Arguments
Example
To use maximum a posteriori estimates of the parameter values, don’t use the sampling context manager:
>>> import probflow as pf >>> param = pf.Parameter() >>> param() [0.07226744] >>> param() # MAP estimate is always the same [0.07226744]
To use a single sample, use the sampling context manager with
n=1
:>>> with pf.Sampling(n=1): >>> param() [-2.2228503] >>> with pf.Sampling(n=1): >>> param() #samples are different [1.3473024]
To use multiple samples, use the sampling context manager and set the number of samples to take with the
n
keyword argument:>>> with pf.Sampling(n=3): >>> param() [[ 0.10457394] [ 0.14018342] [-1.8649881 ]] >>> with pf.Sampling(n=5): >>> param() [[ 2.1035051] [-2.641631 ] [-2.9091313] [ 3.5294306] [ 1.6596333]]
To use static samples - that is, to always return the same samples while in the same context manager - use the sampling context manager with the
static
keyword argument set toTrue
:>>> with pf.Sampling(static=True): >>> param() [ 0.10457394] >>> param() # repeated samples yield the same value [ 0.10457394] >>> with pf.Sampling(static=True): >>> param() # under a new context manager they yield new samples [-2.641631] >>> param() # but remain the same while under the same context [-2.641631]
Base¶
The utils.base module contains abstract base classes (ABCs) for all of ProbFlow’s classes.
- class probflow.utils.base.BaseDistribution(*args)[source]¶
Bases:
abc.ABC
Abstract base class for ProbFlow Distributions
- class probflow.utils.base.BaseParameter(*args)[source]¶
Bases:
abc.ABC
Abstract base class for ProbFlow Parameters
- class probflow.utils.base.BaseModule(*args)[source]¶
Bases:
abc.ABC
Abstract base class for ProbFlow Modules
- class probflow.utils.base.BaseDataGenerator(*args)[source]¶
Bases:
abc.ABC
Abstract base class for ProbFlow DataGenerators
- abstract property n_samples¶
Number of samples in the dataset
- abstract property batch_size¶
Number of samples to generate each minibatch
Ops¶
The utils.ops module contains operations which run using the current backend.
expand_dims()
squeeze()
- probflow.utils.ops.kl_divergence(P, Q)[source]¶
Compute the Kullback–Leibler divergence between two distributions.
- Parameters
P (tensorflow_probability.distributions.Distribution or torch.distributions.distribution) – The first distribution
Q (tensorflow_probability.distributions.Distribution or torch.distributions.distribution) – The second distribution
- Returns
kld – The Kullback–Leibler divergence between P and Q (KL(P || Q))
- Return type
Tensor
- probflow.utils.ops.rand_rademacher(shape)[source]¶
Tensor full of random -1s or 1s (i.e. drawn from a Rademacher dist).
- probflow.utils.ops.std(val, axis=- 1, keepdims=False)[source]¶
The uncorrected sample standard deviation.
- probflow.utils.ops.insert_col_of(vals, val)[source]¶
Add a column of a value to the left side of a tensor
- probflow.utils.ops.new_variable(initial_values)[source]¶
Get a new variable with the current backend, and initialize it
- probflow.utils.ops.log_cholesky_transform(x)[source]¶
Perform the log cholesky transform on a vector of values.
This turns a vector of \(\frac{N(N+1)}{2}\) unconstrained values into a valid \(N \times N\) covariance matrix.
References
Jose C. Pinheiro & Douglas M. Bates. Unconstrained Parameterizations for Variance-Covariance Matrices Statistics and Computing, 1996.
Casting¶
The utils.casting module contains functions for casting back and forth betweeen Tensors and numpy arrays.
to_default_dtype()
make_input_tensor()
Initializers¶
Initializers.
Functions to initialize posterior distribution variables.
xavier()
- Xavier initializerscale_xavier()
- Xavier initializer scaled for scale parameterspos_xavier()
- positive-only initizlier
IO¶
Functions for saving and loading ProbFlow objects
- probflow.utils.io.dumps(obj)[source]¶
Serialize a probflow object to a json-safe string.
Note
This removes the compiled
_train_fn
attribute of a Model which is either a TensorFlow or PyTorch compiled function to perform a single training step. Cloudpickle can’t serialize it, and after de-serializing will just JIT re-compile if needed.
- probflow.utils.io.dump(obj, filename)[source]¶
Serialize a probflow object to file
Note
This removes the compiled
_train_fn
attribute of a Model which is either a TensorFlow or PyTorch compiled function to perform a single training step. Cloudpickle can’t serialize it, and after de-serializing will just JIT re-compile if needed.
Plotting¶
Plotting utilities.
TODO: more info…
- probflow.utils.plotting.approx_kde(data, bins=500, bw=0.075)[source]¶
A fast approximation to kernel density estimation.
- probflow.utils.plotting.get_next_color(def_color, ix)[source]¶
Get the next color in the color cycle
- probflow.utils.plotting.get_ix_label(ix, shape)[source]¶
Get a string representation of the current index
- probflow.utils.plotting.plot_dist(data, xlabel='', style='fill', bins=20, ci=0.0, bw=0.075, alpha=0.4, color=None, legend=True)[source]¶
Plot the distribution of samples.
- Parameters
data (
ndarray
) – Samples to plot. Should be of size (Nsamples,…)xlabel (str) – Label for the x axis
style (str) –
Which style of plot to create. Available types are:
'fill'
- filled density plot (the default)'line'
- line density plot'hist'
- histogram
bins (int or list or
ndarray
) – Number of bins to use for the histogram (ifkde=False
), or a list or vector of bin edges.ci (float between 0 and 1) – Confidence interval to plot. Default = 0.0 (i.e., not plotted)
bw (float) – Bandwidth of the kernel density estimate (if using
style='line'
orstyle='fill'
). Default is 0.075alpha (float between 0 and 1) – Transparency of the plot (if
style``=
’fill’`` or'hist'
)color (matplotlib color code or list of them) – Color(s) to use to plot the distribution. See https://matplotlib.org/tutorials/colors/colors.html Default = use the default matplotlib color cycle
legend (bool) – Whether to show legends for plots with >1 distribution Default = True
- probflow.utils.plotting.plot_line(xdata, ydata, xlabel='', ylabel='', fmt='-', color=None)[source]¶
Plot lines.
- Parameters
xdata (
ndarray
) – X values of points to plot. Should be vector of lengthNsamples
.ydata (
ndarray
) – Y vaules of points to plot. Should be of size(Nsamples,...)
.xlabel (str) – Label for the x axis. Default is no x axis label.
ylabel (str) – Label for the y axis. Default is no y axis label.
fmt (str or matplotlib linespec) – Line marker to use. Default =
'-'
(a normal line).color (matplotlib color code or list of them) – Color(s) to use to plot the distribution. See https://matplotlib.org/tutorials/colors/colors.html Default = use the default matplotlib color cycle
- probflow.utils.plotting.fill_between(xdata, lb, ub, xlabel='', ylabel='', alpha=0.3, color=None)[source]¶
Fill between lines.
- Parameters
xdata (
ndarray
) – X values of points to plot. Should be vector of lengthNsamples
.lb (
ndarray
) – Lower bound of fill. Should be of size(Nsamples,...)
.ub (
ndarray
) – Upper bound of fill. Should be same size as lb.xlabel (str) – Label for the x axis. Default is no x axis label.
ylabel (str) – Label for the y axis. Default is no y axis label.
fmt (str or matplotlib linespec) – Line marker to use. Default =
'-'
(a normal line).color (matplotlib color code or list of them) – Color(s) to use to plot the distribution. See https://matplotlib.org/tutorials/colors/colors.html Default = use the default matplotlib color cycle
- probflow.utils.plotting.plot_by(x, data, bins=30, func='mean', plot=True, bootstrap=100, ci=0.95, **kwargs)[source]¶
Compute and plot some function func of data as a function of x.
- Parameters
x (
ndarray
) – Coordinates of data to plotdata (
ndarray
) – Data to plot by bins of xbins (int) – Number of bins to bin x into
func (callable or str) –
Function to apply on elements of data in each x bin. Can be a callable or one of the following str:
'count'
'sum'
'mean'
'median'
Default =
'mean'
plot (bool) – Whether to plot
data
as a function ofx
Default = Falsebootstrap (None or int > 0) – Number of bootstrap samples to use for estimating the uncertainty of the true coverage.
ci (list of float between 0 and 1) – Bootstrapped confidence interval percentiles of coverage to show.
**kwargs – Additional arguments are passed to plt.plot or fill_between
- Returns
x_o (|ndarray|) –
x
bin centersdata_o (|ndarray|) –
func
applied todata
values in eachx
bin
Torch Distributions¶
Torch backend distributions
Validation¶
The utils.validation module contains functions for checking that inputs have the correct type.