Data¶
TODO: Data utilities, more info…
DataGenerator
- base class for data generators w/ multiprocessingArrayDataGenerator
- data generator for array-structured data
-
class
probflow.data.
DataGenerator
(num_workers=None)[source]¶ Bases:
probflow.utils.base.BaseDataGenerator
Abstract base class for a data generator, which uses multiprocessing to load the data in parallel.
TODO
User needs to implement:
__init__()
And can optionally implement:
-
abstract property
batch_size
¶ Number of samples to generate each minibatch
-
abstract property
n_samples
¶ Number of samples in the dataset
-
on_epoch_end
()¶ Will be called at the end of each training epoch
-
on_epoch_start
()¶ Will be called at the start of each training epoch
-
class
probflow.data.
ArrayDataGenerator
(x=None, y=None, batch_size=None, shuffle=False, test=False, num_workers=None)[source]¶ Bases:
probflow.data.data_generator.DataGenerator
Generate array-structured data to feed through a model.
TODO
- Parameters
x (
ndarray
orDataFrame
orSeries
orDataGenerator
) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)y (
None
orndarray
orDataFrame
orSeries
) – Dependent variable values (or, if fitting a generative model,None
). Should be of shape (Nsamples,…). Default =None
batch_size (int) – Number of samples to use per minibatch. Use
None
to use a single batch for all the data. Default =None
shuffle (bool) – Whether to shuffle the data each epoch. Default =
False
testing (bool) – Whether to treat data as testing data (allow no dependent variable). Default =
False
-
on_epoch_start
()¶ Will be called at the start of each training epoch
-
property
n_samples
¶ Number of samples in the dataset
-
property
batch_size
¶ Number of samples to generate each minibatch