Data¶
TODO: Data utilities, more info…
DataGenerator
- base class for data generators w/ multiprocessingArrayDataGenerator
- data generator for array-structured data
- class probflow.data.DataGenerator(num_workers=None)[source]¶
Bases:
probflow.utils.base.BaseDataGenerator
Abstract base class for a data generator, which uses multiprocessing to load the data in parallel.
TODO
User needs to implement:
__init__()
And can optionally implement:
- abstract property batch_size¶
Number of samples to generate each minibatch
- abstract property n_samples¶
Number of samples in the dataset
- on_epoch_end()¶
Will be called at the end of each training epoch
- on_epoch_start()¶
Will be called at the start of each training epoch
- class probflow.data.ArrayDataGenerator(x=None, y=None, batch_size=None, shuffle=False, test=False, num_workers=None)[source]¶
Bases:
probflow.data.data_generator.DataGenerator
Generate array-structured data to feed through a model.
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Use
None
to use a single batch for all the data. Default =None
shuffle (bool) – Whether to shuffle the data each epoch. Default =
False
testing (bool) – Whether to treat data as testing data (allow no dependent variable). Default =
False
- on_epoch_start()¶
Will be called at the start of each training epoch
- property n_samples¶
Number of samples in the dataset
- property batch_size¶
Number of samples to generate each minibatch