Data

TODO: Data utilities, more info…


class probflow.data.DataGenerator(num_workers=None)[source]

Bases: probflow.utils.base.BaseDataGenerator

Abstract base class for a data generator, which uses multiprocessing to load the data in parallel.

TODO

User needs to implement:

And can optionally implement:

abstract get_batch(index)[source]

Generate one batch of data

abstract property batch_size

Number of samples to generate each minibatch

abstract property n_samples

Number of samples in the dataset

on_epoch_end()

Will be called at the end of each training epoch

on_epoch_start()

Will be called at the start of each training epoch

class probflow.data.ArrayDataGenerator(x=None, y=None, batch_size=None, shuffle=False, test=False, num_workers=None)[source]

Bases: probflow.data.data_generator.DataGenerator

Generate array-structured data to feed through a model.

TODO

Parameters
  • x (ndarray or DataFrame or Series or DataGenerator) – Independent variable values (or, if fitting a generative model, the dependent variable values). Should be of shape (Nsamples,…)

  • y (None or ndarray or DataFrame or Series) – Dependent variable values (or, if fitting a generative model, None). Should be of shape (Nsamples,…). Default = None

  • batch_size (int) – Number of samples to use per minibatch. Use None to use a single batch for all the data. Default = None

  • shuffle (bool) – Whether to shuffle the data each epoch. Default = False

  • testing (bool) – Whether to treat data as testing data (allow no dependent variable). Default = False

on_epoch_start()

Will be called at the start of each training epoch

property n_samples

Number of samples in the dataset

property batch_size

Number of samples to generate each minibatch

get_batch(index)[source]

Generate one batch of data

on_epoch_end()[source]

Shuffle data each epoch

probflow.data.make_generator(x=None, y=None, batch_size=None, shuffle=False, test=False, num_workers=None)[source]

Make input a DataGenerator if not already