Transformations and Preprocessors¶
Summary¶
One of the advantages of using PyTorch as the underlying computation library, is eager graph execution that can leverage native python. In other words, it lets us integrate arbitrary operations in a largely parallel fashion to our training (particularly if we are using the GPU for any neural networks).
Instance Transforms¶
Enter the InstanceTransform
and its subclasses. When added to a Dataset
, these perform
operations on each
fetched recording sequence, be it a trial or cropped sequence of raw data. For the most part, they are simply callable
objects, implementing __call__()
to modify a Tensor
unless they modify the number/representation of
channels, sampling frequency or sequence length of the data.
They are specifically instance transforms, because they do not transform more than a single crop of data (from a single person and dataset). This means, that these are done before a batch is aggregated for training. If the transform results in many differently shaped tensors, a batch will not properly be created, so watch out for that!
Batch Transforms¶
These are the exceptions that prove the InstanceTransform
rule. These transforms operate only after data has
been aggregated into a batch, and it is just about to be fed into a network for training (or otherwise). These are
attached to trainable Processess
instead of Datasets
.
Multiple Worker Processes Warning¶
After attaching enough transforms, you may find that, even with most of the deep learning side being done on the GPU loading the training data may become the bottleneck.
Preprocessors¶
Preprocessor
(s) on the other hand are a method to create a transform after first encountering all of the
Recordings
of a Dataset
. Simply put, if the transform is known a priori, the
BaseTransform
interface is sufficient. Otherwise, a Preprocessor
can be used to both modify
Recordings
in place before
training, and create a transformation to modify sequences on-the-fly.