Datasets

Classes

DN3ataset(*args, **kwds)

Dataset(*args, **kwds)

Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with

DatasetInfo(dataset_name[, data_max, …])

This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects.

EpochTorchRecording(*args, **kwds)

RawTorchRecording(*args, **kwds)

Interface for bridging mne Raw instances as PyTorch compatible “Dataset”.

Thinker(*args, **kwds)

Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions.

class dn3.data.dataset.DN3ataset(*args, **kwds)

Methods

add_transform(transform)

Add a transformation that is applied to every fetched item in the dataset

clear_transforms()

Remove all added transforms from dataset.

clone()

A copy of this object to allow the repetition of recordings, thinkers, etc.

preprocess(preprocessor[, apply_transform])

Applies a preprocessor to the dataset

to_numpy([batch_size, num_workers])

Commits the dataset to numpy-formatted arrays.

Attributes

channels

returns: channels – The channel sets used by the dataset.

sequence_length

returns: sequence_length – The length of each instance in number of samples

sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset.

add_transform(transform)

Add a transformation that is applied to every fetched item in the dataset

Parameters

transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels

returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms()

Remove all added transforms from dataset.

clone()

A copy of this object to allow the repetition of recordings, thinkers, etc. that load data from the same memory/files but have their own tracking of ids.

Returns

cloned – New copy of this object.

Return type

DN3ataset

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)

Applies a preprocessor to the dataset

Parameters
  • preprocessor (Preprocessor) – A preprocessor to be applied

  • apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

property sequence_length

returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

to_numpy(batch_size=64, batch_transforms: list = None, num_workers=4, **dataloader_kwargs)

Commits the dataset to numpy-formatted arrays. Useful for saving dataset to disk, or preparing for tools that expect numpy-formatted data rather than iteratable.

Notes

A pytorch DataLoader is used to fetch the data to conveniently leverage multiprocessing, and naturally

Parameters
  • batch_size (int) – The number of items to fetch per worker. This probably doesn’t need much tuning.

  • num_workers (int) – The number of spawned processes to fetch and transform data.

  • batch_transforms (list) – These are potential batch-level transforms that

  • dataloader_kwargs (dict) – Keyword arguments for the pytorch DataLoader that underpins the fetched data

Returns

data – A list of numpy arrays.

Return type

list

class dn3.data.dataset.Dataset(*args, **kwds)

Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with (largely) consistent:

Methods

add_transform(transform[, thinkers])

Add a transformation that is applied to every fetched item in the dataset

clear_transforms()

Remove all added transforms from dataset.

dump_dataset(toplevel[, apply_transforms])

Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.

get_sessions()

Accumulates all the sessions from each thinker in the dataset in a nested dictionary.

get_targets()

Collect all the targets (i.e.

get_thinkers()

Accumulates a consistently ordered list of all the thinkers in the dataset.

lmso([folds, test_splits, validation_splits])

This generates a “Leave-multiple-subject-out” (LMSO) split.

loso([validation_person_id, test_person_id])

This generates a “Leave-one-subject-out” (LOSO) split.

preprocess(preprocessor[, apply_transform, …])

Applies a preprocessor to the dataset

safe_mode([mode])

This allows switching safe_mode on or off.

update_id_returns([trial, session, person, …])

Updates which ids are to be returned by the dataset.

Attributes

channels

returns: channels – The channel sets used by the dataset.

sequence_length

returns: sequence_length – The length of each instance in number of samples

sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset.

  • hardware: - channel number/labels - sampling frequency

  • annotation paradigm: - consistent event types

add_transform(transform, thinkers=None)

Add a transformation that is applied to every fetched item in the dataset

Parameters

transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels

returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms()

Remove all added transforms from dataset.

dump_dataset(toplevel, apply_transforms=True)

Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.

Parameters
  • toplevel (str) – The toplevel location to dump the dataset to. This folder (and path) will be created if it does not exist. Each person will have a subdirectory therein, with numpy-formatted files for each session within that.

  • apply_transforms (bool) – Whether to apply the transforms while preparing the data to be saved.

get_sessions()

Accumulates all the sessions from each thinker in the dataset in a nested dictionary.

Returns

session_dict – Keys are the thinkers of get_thinkers(), values are each another dictionary that maps session ids to _Recording

Return type

dict

get_targets()

Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.

Returns

targets – A numpy-formatted array of all the targets/label for this thinker.

Return type

np.ndarray

get_thinkers()

Accumulates a consistently ordered list of all the thinkers in the dataset. It is this order that any automatic segmenting through loso() and lmso() will be done.

Returns

thinker_names

Return type

list

lmso(folds=10, test_splits=None, validation_splits=None)

This generates a “Leave-multiple-subject-out” (LMSO) split. In other words X-fold cross-validation, with boundaries enforced at thinkers (each person’s data is not split into different folds).

Parameters
  • folds (int) – If this is specified and splits is None, will split the subjects into this many folds, and then use each fold as a test set in turn (and the previous fold - starting with the last - as validation).

  • test_splits (list, tuple) –

    This should be a list of tuples/lists of either:
    • The ids of the consistent test set. In which case, folds must be specified, or validation_splits is a nested list that .

    • Two sub lists, first testing, second validation ids

Yields
  • training (Dataset) – Another dataset that represents the training set

  • validation (Dataset) – The validation people as a dataset

  • test (Thinker) – The test people as a dataset

loso(validation_person_id=None, test_person_id=None)

This generates a “Leave-one-subject-out” (LOSO) split. Tests each person one-by-one, and validates on the previous (the first is validated with the last).

Parameters
  • validation_person_id ((int, str, list, optional)) – If specified, and corresponds to one of the person_ids in this dataset, the loso cross validation will consistently generate this thinker as validation. If list, must be the same length as test_person_id, say a length N. If so, will yield N each in sequence, and use remainder for test.

  • test_person_id ((int, str, list, optional)) – Same as validation_person_id, but for testing. However, testing may be a list when validation is a single value. Thus if testing is N ids, will yield N values, with a consistent single validation person. If a single id (int or str), and validation_person_id is not also a single id, will ignore validation_person_id and loop through all others that are not the test_person_id.

Yields
  • training (Dataset) – Another dataset that represents the training set

  • validation (Thinker) – The validation thinker

  • test (Thinker) – The test thinker

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, thinkers=None)

Applies a preprocessor to the dataset

Parameters
  • preprocessor (Preprocessor) – A preprocessor to be applied

  • thinkers ((None, Iterable)) – If specified (default is None), the thinkers to use for preprocessing calculation

  • apply_transform (bool) – Whether to apply the transform to this dataset (all thinkers, not just those specified for preprocessing) after preprocessing them. Exclusive application to specific thinkers can be done using the return value and a separate call to add_transform with the same thinkers list.

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

safe_mode(mode=True)

This allows switching safe_mode on or off. When safe_mode is on, if data is ever NaN, it is captured before being returned and a report is generated.

Parameters

mode (bool) – The status of whether in safe mode or not.

property sequence_length

returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

update_id_returns(trial=None, session=None, person=None, task=None, dataset=None)

Updates which ids are to be returned by the dataset. If any argument is None it preserves the previous value.

Parameters
  • trial (None, bool) – Whether to return trial ids.

  • session (None, bool) – Whether to return session ids.

  • person (None, bool) – Whether to return person ids.

  • task (None, bool) – Whether to return task ids.

  • dataset (None, bool) – Whether to return dataset ids.

class dn3.data.dataset.DatasetInfo(dataset_name, data_max=None, data_min=None, excluded_people=None, targets=None)

This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects. Generally not necessary to be constructed manually, these are created by the configuratron to automatically create transforms and/or other processes downstream.

class dn3.data.dataset.EpochTorchRecording(*args, **kwds)

Methods

event_mapping()

Maps the labels returned by this to the events as recorded in the original annotations or stim channel.

preprocess(preprocessor[, apply_transform])

Applies a preprocessor to the dataset

event_mapping()

Maps the labels returned by this to the events as recorded in the original annotations or stim channel.

Returns

mapping – Keys are the class labels used by this object, values are the original event signifier.

Return type

dict

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)

Applies a preprocessor to the dataset

Parameters
  • preprocessor (Preprocessor) – A preprocessor to be applied

  • apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

class dn3.data.dataset.RawTorchRecording(*args, **kwds)

Interface for bridging mne Raw instances as PyTorch compatible “Dataset”.

Parameters
  • raw (mne.io.Raw) – Raw data, data does not need to be preloaded.

  • tlen (float) – Length of recording specified in seconds.

  • session_id ((int, str, optional)) – A unique (with respect to a thinker within an eventual dataset) identifier for the current recording session. If not specified, defaults to ‘0’.

  • person_id ((int, str, optional)) – A unique (with respect to an eventual dataset) identifier for the particular person being recorded.

  • stride (int) – The number of samples to skip between each starting offset of loaded samples.

Methods

preprocess(preprocessor[, apply_transform])

Applies a preprocessor to the dataset

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)

Applies a preprocessor to the dataset

Parameters
  • preprocessor (Preprocessor) – A preprocessor to be applied

  • apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

class dn3.data.dataset.Thinker(*args, **kwds)

Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions.

Methods

add_transform(transform)

Add a transformation that is applied to every fetched item in the dataset

clear_transforms([deep_clear])

Remove all added transforms from dataset.

get_targets()

Collect all the targets (i.e.

preprocess(preprocessor[, apply_transform, …])

Applies a preprocessor to the dataset

split([training_sess_ids, …])

Split the thinker’s data into training, validation and testing sets.

Attributes

channels

returns: channels – The channel sets used by the dataset.

sequence_length

returns: sequence_length – The length of each instance in number of samples

sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset.

add_transform(transform)

Add a transformation that is applied to every fetched item in the dataset

Parameters

transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels

returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms(deep_clear=False)

Remove all added transforms from dataset.

get_targets()

Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.

Returns

targets – A numpy-formatted array of all the targets/label for this thinker.

Return type

np.ndarray

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, sessions=None)

Applies a preprocessor to the dataset

Parameters
  • preprocessor (Preprocessor) – A preprocessor to be applied

  • sessions ((None, Iterable)) – If specified (default is None), the sessions to use for preprocessing calculation

  • apply_transform (bool) – Whether to apply the transform to this dataset (all sessions, not just those specified for preprocessing) after preprocessing them. Exclusive application to select sessions can be done using the return value and a separate call to add_transform with the same sessions list.

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

property sequence_length

returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq

returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

split(training_sess_ids=None, validation_sess_ids=None, testing_sess_ids=None, test_frac=0.25, validation_frac=0.25)

Split the thinker’s data into training, validation and testing sets.

Parameters
  • test_frac (float) – Proportion of the total data to use for testing, this is overridden by testing_sess_ids.

  • validation_frac (float) – Proportion of the data remaining - after removing test proportion/sessions - to use as validation data. Likewise, validation_sess_ids overrides this value.

  • training_sess_ids (: (Iterable, None)) – The session ids to be explicitly used for training.

  • validation_sess_ids ((Iterable, None)) – The session ids to be explicitly used for validation.

  • testing_sess_ids ((Iterable, None)) – The session ids to be explicitly used for testing.

Returns

  • training (DN3ataset) – The training dataset

  • validation (DN3ataset) – The validation dataset

  • testing (DN3ataset) – The testing dataset