Datasets¶

Classes

`DN3ataset`(args, *kwds)
`Dataset`(args, *kwds)	Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with
`DatasetInfo`(dataset_name[, data_max, …])	This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects.
`EpochTorchRecording`(args, *kwds)
`RawTorchRecording`(args, *kwds)	Interface for bridging mne Raw instances as PyTorch compatible “Dataset”.
`Thinker`(args, *kwds)	Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions.

class dn3.data.dataset.DN3ataset(*args, **kwds)¶

Methods

`add_transform`(transform)	Add a transformation that is applied to every fetched item in the dataset
`clear_transforms`()	Remove all added transforms from dataset.
`clone`()	A copy of this object to allow the repetition of recordings, thinkers, etc.
`preprocess`(preprocessor[, apply_transform])	Applies a preprocessor to the dataset
`to_numpy`([batch_size, num_workers])	Commits the dataset to numpy-formatted arrays.

Attributes

`channels`	returns: channels – The channel sets used by the dataset.
`sequence_length`	returns: sequence_length – The length of each instance in number of samples
`sfreq`	returns: sampling_frequency – The sampling frequencies employed by the dataset.

add_transform(transform)¶

Add a transformation that is applied to every fetched item in the dataset

Parameters: transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels¶: returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms()¶: Remove all added transforms from dataset.

clone()¶

A copy of this object to allow the repetition of recordings, thinkers, etc. that load data from the same memory/files but have their own tracking of ids.

Returns: cloned – New copy of this object.
Return type: DN3ataset

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶

Applies a preprocessor to the dataset

Parameters

preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

property sequence_length¶: returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq¶: returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

to_numpy(batch_size=64, batch_transforms: list = None, num_workers=4, **dataloader_kwargs)¶

Commits the dataset to numpy-formatted arrays. Useful for saving dataset to disk, or preparing for tools that expect numpy-formatted data rather than iteratable.

Notes

A pytorch DataLoader is used to fetch the data to conveniently leverage multiprocessing, and naturally

Parameters

batch_size (int) – The number of items to fetch per worker. This probably doesn’t need much tuning.
num_workers (int) – The number of spawned processes to fetch and transform data.
batch_transforms (list) – These are potential batch-level transforms that
dataloader_kwargs (dict) – Keyword arguments for the pytorch DataLoader that underpins the fetched data

Returns

data – A list of numpy arrays.

Return type

list

class dn3.data.dataset.Dataset(*args, **kwds)¶

Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with (largely) consistent:

Methods

`add_transform`(transform[, thinkers])	Add a transformation that is applied to every fetched item in the dataset
`clear_transforms`()	Remove all added transforms from dataset.
`dump_dataset`(toplevel[, apply_transforms])	Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.
`get_sessions`()	Accumulates all the sessions from each thinker in the dataset in a nested dictionary.
`get_targets`()	Collect all the targets (i.e.
`get_thinkers`()	Accumulates a consistently ordered list of all the thinkers in the dataset.
`lmso`([folds, test_splits, validation_splits])	This generates a “Leave-multiple-subject-out” (LMSO) split.
`loso`([validation_person_id, test_person_id])	This generates a “Leave-one-subject-out” (LOSO) split.
`preprocess`(preprocessor[, apply_transform, …])	Applies a preprocessor to the dataset
`safe_mode`([mode])	This allows switching safe_mode on or off.
`update_id_returns`([trial, session, person, …])	Updates which ids are to be returned by the dataset.

Attributes

`channels`	returns: channels – The channel sets used by the dataset.
`sequence_length`	returns: sequence_length – The length of each instance in number of samples
`sfreq`	returns: sampling_frequency – The sampling frequencies employed by the dataset.

hardware: - channel number/labels - sampling frequency

annotation paradigm: - consistent event types

add_transform(transform, thinkers=None)¶

Add a transformation that is applied to every fetched item in the dataset

Parameters: transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels¶: returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms()¶: Remove all added transforms from dataset.

dump_dataset(toplevel, apply_transforms=True)¶

Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.

Parameters

toplevel (str) – The toplevel location to dump the dataset to. This folder (and path) will be created if it does not exist. Each person will have a subdirectory therein, with numpy-formatted files for each session within that.
apply_transforms (bool) – Whether to apply the transforms while preparing the data to be saved.

get_sessions()¶

Accumulates all the sessions from each thinker in the dataset in a nested dictionary.

Returns: session_dict – Keys are the thinkers of get_thinkers(), values are each another dictionary that maps session ids to _Recording
Return type: dict

get_targets()¶

Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.

Returns: targets – A numpy-formatted array of all the targets/label for this thinker.
Return type: np.ndarray

get_thinkers()¶

Accumulates a consistently ordered list of all the thinkers in the dataset. It is this order that any automatic segmenting through loso() and lmso() will be done.

Returns: thinker_names
Return type: list

lmso(folds=10, test_splits=None, validation_splits=None)¶

This generates a “Leave-multiple-subject-out” (LMSO) split. In other words X-fold cross-validation, with boundaries enforced at thinkers (each person’s data is not split into different folds).

Parameters

folds (int) – If this is specified and splits is None, will split the subjects into this many folds, and then use each fold as a test set in turn (and the previous fold - starting with the last - as validation).
test_splits (list, tuple) –
This should be a list of tuples/lists of either:
- The ids of the consistent test set. In which case, folds must be specified, or validation_splits is a nested list that .
- Two sub lists, first testing, second validation ids

Yields

training (Dataset) – Another dataset that represents the training set
validation (Dataset) – The validation people as a dataset
test (Thinker) – The test people as a dataset

loso(validation_person_id=None, test_person_id=None)¶

This generates a “Leave-one-subject-out” (LOSO) split. Tests each person one-by-one, and validates on the previous (the first is validated with the last).

Parameters

validation_person_id ((int, str, list, optional)) – If specified, and corresponds to one of the person_ids in this dataset, the loso cross validation will consistently generate this thinker as validation. If list, must be the same length as test_person_id, say a length N. If so, will yield N each in sequence, and use remainder for test.
test_person_id ((int, str, list, optional)) – Same as validation_person_id, but for testing. However, testing may be a list when validation is a single value. Thus if testing is N ids, will yield N values, with a consistent single validation person. If a single id (int or str), and validation_person_id is not also a single id, will ignore validation_person_id and loop through all others that are not the test_person_id.

Yields

training (Dataset) – Another dataset that represents the training set
validation (Thinker) – The validation thinker
test (Thinker) – The test thinker

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, thinkers=None)¶

Applies a preprocessor to the dataset

Parameters

preprocessor (Preprocessor) – A preprocessor to be applied
thinkers ((None, Iterable)) – If specified (default is None), the thinkers to use for preprocessing calculation
apply_transform (bool) – Whether to apply the transform to this dataset (all thinkers, not just those specified for preprocessing) after preprocessing them. Exclusive application to specific thinkers can be done using the return value and a separate call to add_transform with the same thinkers list.

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

safe_mode(mode=True)¶

This allows switching safe_mode on or off. When safe_mode is on, if data is ever NaN, it is captured before being returned and a report is generated.

Parameters: mode (bool) – The status of whether in safe mode or not.

property sequence_length¶: returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq¶: returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

update_id_returns(trial=None, session=None, person=None, task=None, dataset=None)¶

Updates which ids are to be returned by the dataset. If any argument is None it preserves the previous value.

Parameters

trial (None, bool) – Whether to return trial ids.
session (None, bool) – Whether to return session ids.
person (None, bool) – Whether to return person ids.
task (None, bool) – Whether to return task ids.
dataset (None, bool) – Whether to return dataset ids.

class dn3.data.dataset.DatasetInfo(dataset_name, data_max=None, data_min=None, excluded_people=None, targets=None)¶: This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects. Generally not necessary to be constructed manually, these are created by the configuratron to automatically create transforms and/or other processes downstream.

class dn3.data.dataset.EpochTorchRecording(*args, **kwds)¶

Methods

`event_mapping`()	Maps the labels returned by this to the events as recorded in the original annotations or stim channel.
`preprocess`(preprocessor[, apply_transform])	Applies a preprocessor to the dataset

event_mapping()¶

Maps the labels returned by this to the events as recorded in the original annotations or stim channel.

Returns: mapping – Keys are the class labels used by this object, values are the original event signifier.
Return type: dict

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶

Applies a preprocessor to the dataset

Parameters

preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

class dn3.data.dataset.RawTorchRecording(*args, **kwds)¶

Interface for bridging mne Raw instances as PyTorch compatible “Dataset”.

Parameters

raw (mne.io.Raw) – Raw data, data does not need to be preloaded.
tlen (float) – Length of recording specified in seconds.
session_id ((int, str, optional)) – A unique (with respect to a thinker within an eventual dataset) identifier for the current recording session. If not specified, defaults to ‘0’.
person_id ((int, str, optional)) – A unique (with respect to an eventual dataset) identifier for the particular person being recorded.
stride (int) – The number of samples to skip between each starting offset of loaded samples.

Methods

preprocess(preprocessor[, apply_transform])

Applies a preprocessor to the dataset

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶

Applies a preprocessor to the dataset

Parameters

preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through Preprocessor.get_transform()

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

class dn3.data.dataset.Thinker(*args, **kwds)¶

Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions.

Methods

`add_transform`(transform)	Add a transformation that is applied to every fetched item in the dataset
`clear_transforms`([deep_clear])	Remove all added transforms from dataset.
`get_targets`()	Collect all the targets (i.e.
`preprocess`(preprocessor[, apply_transform, …])	Applies a preprocessor to the dataset
`split`([training_sess_ids, …])	Split the thinker’s data into training, validation and testing sets.

Attributes

`channels`	returns: channels – The channel sets used by the dataset.
`sequence_length`	returns: sequence_length – The length of each instance in number of samples
`sfreq`	returns: sampling_frequency – The sampling frequencies employed by the dataset.

add_transform(transform)¶

Add a transformation that is applied to every fetched item in the dataset

Parameters: transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.

property channels¶: returns: channels – The channel sets used by the dataset. :rtype: list

clear_transforms(deep_clear=False)¶: Remove all added transforms from dataset.

get_targets()¶

Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.

Returns: targets – A numpy-formatted array of all the targets/label for this thinker.
Return type: np.ndarray

preprocess(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, sessions=None)¶

Applies a preprocessor to the dataset

Parameters

preprocessor (Preprocessor) – A preprocessor to be applied
sessions ((None, Iterable)) – If specified (default is None), the sessions to use for preprocessing calculation
apply_transform (bool) – Whether to apply the transform to this dataset (all sessions, not just those specified for preprocessing) after preprocessing them. Exclusive application to select sessions can be done using the return value and a separate call to add_transform with the same sessions list.

Returns

preprocessor – The preprocessor after application to all relevant thinkers

Return type

Preprocessor

property sequence_length¶: returns: sequence_length – The length of each instance in number of samples :rtype: int, list

property sfreq¶: returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list

split(training_sess_ids=None, validation_sess_ids=None, testing_sess_ids=None, test_frac=0.25, validation_frac=0.25)¶

Split the thinker’s data into training, validation and testing sets.

Parameters

test_frac (float) – Proportion of the total data to use for testing, this is overridden by testing_sess_ids.
validation_frac (float) – Proportion of the data remaining - after removing test proportion/sessions - to use as validation data. Likewise, validation_sess_ids overrides this value.
training_sess_ids (: (Iterable, None)) – The session ids to be explicitly used for training.
validation_sess_ids ((Iterable, None)) – The session ids to be explicitly used for validation.
testing_sess_ids ((Iterable, None)) – The session ids to be explicitly used for testing.

Returns

training (DN3ataset) – The training dataset
validation (DN3ataset) – The validation dataset
testing (DN3ataset) – The testing dataset