Datasets¶
Classes
|
|
|
Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with |
|
This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects. |
|
|
|
Interface for bridging mne Raw instances as PyTorch compatible “Dataset”. |
|
Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions. |
-
class
dn3.data.dataset.
DN3ataset
(*args, **kwds)¶ Methods
add_transform
(transform)Add a transformation that is applied to every fetched item in the dataset
Remove all added transforms from dataset.
clone
()A copy of this object to allow the repetition of recordings, thinkers, etc.
preprocess
(preprocessor[, apply_transform])Applies a preprocessor to the dataset
to_numpy
([batch_size, num_workers])Commits the dataset to numpy-formatted arrays.
Attributes
returns: channels – The channel sets used by the dataset.
returns: sequence_length – The length of each instance in number of samples
returns: sampling_frequency – The sampling frequencies employed by the dataset.
-
add_transform
(transform)¶ Add a transformation that is applied to every fetched item in the dataset
- Parameters
transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.
-
property
channels
¶ returns: channels – The channel sets used by the dataset. :rtype: list
-
clear_transforms
()¶ Remove all added transforms from dataset.
-
clone
()¶ A copy of this object to allow the repetition of recordings, thinkers, etc. that load data from the same memory/files but have their own tracking of ids.
- Returns
cloned – New copy of this object.
- Return type
-
preprocess
(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶ Applies a preprocessor to the dataset
- Parameters
preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through
Preprocessor.get_transform()
- Returns
preprocessor – The preprocessor after application to all relevant thinkers
- Return type
-
property
sequence_length
¶ returns: sequence_length – The length of each instance in number of samples :rtype: int, list
-
property
sfreq
¶ returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list
-
to_numpy
(batch_size=64, batch_transforms: list = None, num_workers=4, **dataloader_kwargs)¶ Commits the dataset to numpy-formatted arrays. Useful for saving dataset to disk, or preparing for tools that expect numpy-formatted data rather than iteratable.
Notes
A pytorch
DataLoader
is used to fetch the data to conveniently leverage multiprocessing, and naturally- Parameters
batch_size (int) – The number of items to fetch per worker. This probably doesn’t need much tuning.
num_workers (int) – The number of spawned processes to fetch and transform data.
batch_transforms (list) – These are potential batch-level transforms that
dataloader_kwargs (dict) – Keyword arguments for the pytorch
DataLoader
that underpins the fetched data
- Returns
data – A list of numpy arrays.
- Return type
list
-
-
class
dn3.data.dataset.
Dataset
(*args, **kwds)¶ Collects thinkers, each of which may collect multiple recording sessions of the same tasks, into a dataset with (largely) consistent:
Methods
add_transform
(transform[, thinkers])Add a transformation that is applied to every fetched item in the dataset
Remove all added transforms from dataset.
dump_dataset
(toplevel[, apply_transforms])Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.
Accumulates all the sessions from each thinker in the dataset in a nested dictionary.
Collect all the targets (i.e.
Accumulates a consistently ordered list of all the thinkers in the dataset.
lmso
([folds, test_splits, validation_splits])This generates a “Leave-multiple-subject-out” (LMSO) split.
loso
([validation_person_id, test_person_id])This generates a “Leave-one-subject-out” (LOSO) split.
preprocess
(preprocessor[, apply_transform, …])Applies a preprocessor to the dataset
safe_mode
([mode])This allows switching safe_mode on or off.
update_id_returns
([trial, session, person, …])Updates which ids are to be returned by the dataset.
Attributes
returns: channels – The channel sets used by the dataset.
returns: sequence_length – The length of each instance in number of samples
returns: sampling_frequency – The sampling frequencies employed by the dataset.
hardware: - channel number/labels - sampling frequency
annotation paradigm: - consistent event types
-
add_transform
(transform, thinkers=None)¶ Add a transformation that is applied to every fetched item in the dataset
- Parameters
transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.
-
property
channels
¶ returns: channels – The channel sets used by the dataset. :rtype: list
-
clear_transforms
()¶ Remove all added transforms from dataset.
-
dump_dataset
(toplevel, apply_transforms=True)¶ Dumps the dataset to the file location specified by toplevel, with a single file per session made of all the return tensors (as numpy data) loaded by the dataset.
- Parameters
toplevel (str) – The toplevel location to dump the dataset to. This folder (and path) will be created if it does not exist. Each person will have a subdirectory therein, with numpy-formatted files for each session within that.
apply_transforms (bool) – Whether to apply the transforms while preparing the data to be saved.
-
get_sessions
()¶ Accumulates all the sessions from each thinker in the dataset in a nested dictionary.
- Returns
session_dict – Keys are the thinkers of
get_thinkers()
, values are each another dictionary that maps session ids to_Recording
- Return type
dict
-
get_targets
()¶ Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.
- Returns
targets – A numpy-formatted array of all the targets/label for this thinker.
- Return type
np.ndarray
-
get_thinkers
()¶ Accumulates a consistently ordered list of all the thinkers in the dataset. It is this order that any automatic segmenting through
loso()
andlmso()
will be done.- Returns
thinker_names
- Return type
list
-
lmso
(folds=10, test_splits=None, validation_splits=None)¶ This generates a “Leave-multiple-subject-out” (LMSO) split. In other words X-fold cross-validation, with boundaries enforced at thinkers (each person’s data is not split into different folds).
- Parameters
folds (int) – If this is specified and splits is None, will split the subjects into this many folds, and then use each fold as a test set in turn (and the previous fold - starting with the last - as validation).
test_splits (list, tuple) –
- This should be a list of tuples/lists of either:
The ids of the consistent test set. In which case, folds must be specified, or validation_splits is a nested list that .
Two sub lists, first testing, second validation ids
- Yields
training (Dataset) – Another dataset that represents the training set
validation (Dataset) – The validation people as a dataset
test (Thinker) – The test people as a dataset
-
loso
(validation_person_id=None, test_person_id=None)¶ This generates a “Leave-one-subject-out” (LOSO) split. Tests each person one-by-one, and validates on the previous (the first is validated with the last).
- Parameters
validation_person_id ((int, str, list, optional)) – If specified, and corresponds to one of the person_ids in this dataset, the loso cross validation will consistently generate this thinker as validation. If list, must be the same length as test_person_id, say a length N. If so, will yield N each in sequence, and use remainder for test.
test_person_id ((int, str, list, optional)) – Same as validation_person_id, but for testing. However, testing may be a list when validation is a single value. Thus if testing is N ids, will yield N values, with a consistent single validation person. If a single id (int or str), and validation_person_id is not also a single id, will ignore validation_person_id and loop through all others that are not the test_person_id.
- Yields
training (Dataset) – Another dataset that represents the training set
validation (Thinker) – The validation thinker
test (Thinker) – The test thinker
-
preprocess
(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, thinkers=None)¶ Applies a preprocessor to the dataset
- Parameters
preprocessor (Preprocessor) – A preprocessor to be applied
thinkers ((None, Iterable)) – If specified (default is None), the thinkers to use for preprocessing calculation
apply_transform (bool) – Whether to apply the transform to this dataset (all thinkers, not just those specified for preprocessing) after preprocessing them. Exclusive application to specific thinkers can be done using the return value and a separate call to add_transform with the same thinkers list.
- Returns
preprocessor – The preprocessor after application to all relevant thinkers
- Return type
-
safe_mode
(mode=True)¶ This allows switching safe_mode on or off. When safe_mode is on, if data is ever NaN, it is captured before being returned and a report is generated.
- Parameters
mode (bool) – The status of whether in safe mode or not.
-
property
sequence_length
¶ returns: sequence_length – The length of each instance in number of samples :rtype: int, list
-
property
sfreq
¶ returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list
-
update_id_returns
(trial=None, session=None, person=None, task=None, dataset=None)¶ Updates which ids are to be returned by the dataset. If any argument is None it preserves the previous value.
- Parameters
trial (None, bool) – Whether to return trial ids.
session (None, bool) – Whether to return session ids.
person (None, bool) – Whether to return person ids.
task (None, bool) – Whether to return task ids.
dataset (None, bool) – Whether to return dataset ids.
-
class
dn3.data.dataset.
DatasetInfo
(dataset_name, data_max=None, data_min=None, excluded_people=None, targets=None)¶ This objects contains non-critical meta-data that might need to be tracked for :py:`Dataset` objects. Generally not necessary to be constructed manually, these are created by the configuratron to automatically create transforms and/or other processes downstream.
-
class
dn3.data.dataset.
EpochTorchRecording
(*args, **kwds)¶ Methods
Maps the labels returned by this to the events as recorded in the original annotations or stim channel.
preprocess
(preprocessor[, apply_transform])Applies a preprocessor to the dataset
-
event_mapping
()¶ Maps the labels returned by this to the events as recorded in the original annotations or stim channel.
- Returns
mapping – Keys are the class labels used by this object, values are the original event signifier.
- Return type
dict
-
preprocess
(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶ Applies a preprocessor to the dataset
- Parameters
preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through
Preprocessor.get_transform()
- Returns
preprocessor – The preprocessor after application to all relevant thinkers
- Return type
-
-
class
dn3.data.dataset.
RawTorchRecording
(*args, **kwds)¶ Interface for bridging mne Raw instances as PyTorch compatible “Dataset”.
- Parameters
raw (mne.io.Raw) – Raw data, data does not need to be preloaded.
tlen (float) – Length of recording specified in seconds.
session_id ((int, str, optional)) – A unique (with respect to a thinker within an eventual dataset) identifier for the current recording session. If not specified, defaults to ‘0’.
person_id ((int, str, optional)) – A unique (with respect to an eventual dataset) identifier for the particular person being recorded.
stride (int) – The number of samples to skip between each starting offset of loaded samples.
Methods
preprocess
(preprocessor[, apply_transform])Applies a preprocessor to the dataset
-
preprocess
(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True)¶ Applies a preprocessor to the dataset
- Parameters
preprocessor (Preprocessor) – A preprocessor to be applied
apply_transform (bool) – Whether to apply the transform to this dataset (and all members e.g thinkers or sessions) after preprocessing them. Alternatively, the preprocessor is returned for manual application of its transform through
Preprocessor.get_transform()
- Returns
preprocessor – The preprocessor after application to all relevant thinkers
- Return type
-
class
dn3.data.dataset.
Thinker
(*args, **kwds)¶ Collects multiple recordings of the same person, intended to be of the same task, at different times or conditions.
Methods
add_transform
(transform)Add a transformation that is applied to every fetched item in the dataset
clear_transforms
([deep_clear])Remove all added transforms from dataset.
Collect all the targets (i.e.
preprocess
(preprocessor[, apply_transform, …])Applies a preprocessor to the dataset
split
([training_sess_ids, …])Split the thinker’s data into training, validation and testing sets.
Attributes
returns: channels – The channel sets used by the dataset.
returns: sequence_length – The length of each instance in number of samples
returns: sampling_frequency – The sampling frequencies employed by the dataset.
-
add_transform
(transform)¶ Add a transformation that is applied to every fetched item in the dataset
- Parameters
transform (BaseTransform) – For each item retrieved by __getitem__, transform is called to modify that item.
-
property
channels
¶ returns: channels – The channel sets used by the dataset. :rtype: list
-
clear_transforms
(deep_clear=False)¶ Remove all added transforms from dataset.
-
get_targets
()¶ Collect all the targets (i.e. labels) that this Thinker’s data is annotated with.
- Returns
targets – A numpy-formatted array of all the targets/label for this thinker.
- Return type
np.ndarray
-
preprocess
(preprocessor: dn3.transforms.preprocessors.Preprocessor, apply_transform=True, sessions=None)¶ Applies a preprocessor to the dataset
- Parameters
preprocessor (Preprocessor) – A preprocessor to be applied
sessions ((None, Iterable)) – If specified (default is None), the sessions to use for preprocessing calculation
apply_transform (bool) – Whether to apply the transform to this dataset (all sessions, not just those specified for preprocessing) after preprocessing them. Exclusive application to select sessions can be done using the return value and a separate call to add_transform with the same sessions list.
- Returns
preprocessor – The preprocessor after application to all relevant thinkers
- Return type
-
property
sequence_length
¶ returns: sequence_length – The length of each instance in number of samples :rtype: int, list
-
property
sfreq
¶ returns: sampling_frequency – The sampling frequencies employed by the dataset. :rtype: float, list
-
split
(training_sess_ids=None, validation_sess_ids=None, testing_sess_ids=None, test_frac=0.25, validation_frac=0.25)¶ Split the thinker’s data into training, validation and testing sets.
- Parameters
test_frac (float) – Proportion of the total data to use for testing, this is overridden by testing_sess_ids.
validation_frac (float) – Proportion of the data remaining - after removing test proportion/sessions - to use as validation data. Likewise, validation_sess_ids overrides this value.
training_sess_ids (: (Iterable, None)) – The session ids to be explicitly used for training.
validation_sess_ids ((Iterable, None)) – The session ids to be explicitly used for validation.
testing_sess_ids ((Iterable, None)) – The session ids to be explicitly used for testing.
- Returns
training (DN3ataset) – The training dataset
validation (DN3ataset) – The validation dataset
testing (DN3ataset) – The testing dataset
-