
Specification of what data the model will load is given as part of your (base_models) model class, this does not specify where the data will be loaded from, but what data (e.g. the given name, the directory name, normalisation used, delta and delta-delta inclusion, and file extension). The location of the data is given when running an experiment using the Command line arguments.

If you need to load a file with a custom function create a subclass of _DataSource (loading can include preprocessing if needed).

The following sections describe the provided utilities for loading code in Morgana, however these are all used internally by experiment_builder, typically the only thing you need to be aware of are the available data sources and feature normalisers.

Batching utility, batch_size=32, shuffle=True, num_data_threads=0, device='cpu')[source]

Creates the batched data loader for the dataset given, maps the batches to a given device.

  • data_generator ( or FilesDataset) – Dataset from which to load the batches of data.

  • batch_size (int) – Number of samples to load per batch.

  • shuffle (bool) – Whether to shuffle the data every epoch.

  • num_data_threads (int) – Number of parallel subprocesses to use for data loading.

  • device (str) – Name of the device to place the parameters on.


An instance with the __iter__ method, allowing for iteration over batches of the dataset.

Return type (in a ToDeviceWrapper container)



This dataset provides indexing access to the _DataSource instances given. It provides a custom collate_fn for transposing and padding a dictionary of features.

class, data_dir, id_list, normalisers, data_root='.')[source]


Combines multiple _DataSource instances, and enables batching of a dictionary of sequence features.

  • data_sources (dict[str, _DataSource]) – Specification of the different data to be loaded.

  • data_dir (str) – The directory containing all data for this dataset split.

  • id_list (str) – The name of the file id-list containing base names to load, contained withing data_root.

  • normalisers (Normalisers or dict[str, _FeatureNormaliser]) – Normaliser instances used to normalise loaded features (and delta features).

  • data_root (str) – The directory root for this dataset.


List of base names loaded from id_list.




The normaliser instances, set automatically by morgana.experiment_builder.ExperimentBuilder.


Normalisers or dict[str, _FeatureNormaliser]

static collate_fn(batch)[source]

Collates a list of outputs from self.__getitem__ into a batched structure.


batch (list[dict[str, object]]) – Each element in the list is a non-nested dictionary containing features loaded by each data source.


batched_features – Batched version of the list of features items in batch. Note, it is possible to provide objects such as strings that will not be converted to torch.Tensor, these will not be padded or sent to the correct device, but can be accessed in the features dictionary.

Return type

dict[str, torch.Tensor]

DataSource specification


Data sources are defined in tts_data_tools, these provide a consistent interface to define what features to load for a model.


Supported Feature normalisers are limited to mvn and minmax. To define a new normalisers you should override Normalisers.create_normaliser().


class tts_data_tools.data_sources._DataSource(name, normalisation=None, use_deltas=False, ext=None)[source]

Bases: object

Abstract data loading class.

  • name (str) – Name of directory that will contain this feature.

  • normalisation (None or str) – Type of normalisation to perform. This allows the type of normalisation to be specified, but the normaliser itself will not be contained within the data source, that must be handled outside of the data source.

  • use_deltas (bool) – Whether to compute delta features.

  • ext (str, optional) – The file extension of the saved features, if not set is used.


The data setup assumes a folder structure such as the following example,

dataset_name (data_root)

    train (data_dir)

        lab (name)

        lf0 (name)

    valid (data_dir)


All data is contained below data_root.

There can be multiple data_dir directories, e.g. one for each data split (train, valid, test).

Each feature should have a directory within data_dir, this will contain all files for this feature.

While normalisation is not handled here, you should ensure there are files present containing the normalisation parameters, e.g. ‘lf0_mvn.json’. Such files should exist for all data sources requiring normalisation, with an additional file for all data sources using delta features.

file_path(self, base_name, data_dir)[source]

Creates file path for a given base name and data directory.

load_file(self, base_name, data_dir)[source]

Loads the contents of a given file. Must either be a sequence feature with 2 dimensions or a scalar value.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type

int or float or bool or np.ndarray, shape (seq_len, feat_dim)

save_file(self, data, base_name, data_dir)[source]

Saves data to a file using the format defined by the class.

  • data (int or float or bool or np.ndarray, shape (seq_len, feat_dim)) – Data loaded from the file specified.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.

__call__(self, base_name, data_dir)[source]

Loads the feature and creates deltas if specified by this data source.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Loaded feature, and deltas if specified.

Return type

dict[str, (int or float or bool or np.ndarray)]


class tts_data_tools.data_sources.NumpyBinarySource(name, normalisation=None, use_deltas=False, ext='npy')[source]

Bases: tts_data_tools.data_sources._DataSource

Data loading class for features saved with np.ndarray.tofile, loading is thus performed using np.fromfile.

  • name (str) – Name of directory that will contain this feature.

  • normalisation (str) – Type of normalisation to perform. This allows the type of normalisation to be specified, but the normaliser itself will not be contained within the data source, that must be handled outside of the data source.

  • use_deltas (bool) – Whether to compute delta features. If normalisation is being used it will also perform normalisation of deltas.

  • ext (str, optional) – The file extension of the saved features, if not set name is used.

load_file(self, base_name, data_dir)[source]

Loads the feature using np.load.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type

int or float or bool or np.ndarray, shape (seq_len, feat_dim)

save_file(self, data, base_name, data_dir)[source]

Saves the feature using

  • data (int or float or bool or np.ndarray, shape (seq_len, feat_dim)) – Data loaded from the file specified.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


class tts_data_tools.data_sources.TextSource(name, normalisation=None, use_deltas=False, ext='txt')[source]

Bases: tts_data_tools.data_sources._DataSource

Loads data from a text file, this can contain integers or floats and will have up to 2 dimensions.

  • name (str) – Name of directory that will contain this feature.

  • normalisation (str) – Type of normalisation to perform. This allows the type of normalisation to be specified, but the normaliser itself will not be contained within the data source, that must be handled outside of the data source.

  • use_deltas (bool) – Whether to compute delta features. If normalisation is being used it will also perform normalisation of deltas.

  • ext (str, optional) – The file extension of the saved features, if not set name is used.

load_file(self, base_name, data_dir)[source]

Loads the feature from a text file into a numpy array.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type

int or float or np.ndarray, shape (seq_len, feat_dim)

save_file(self, data, base_name, data_dir)[source]

Saves data as a text file.

  • data (int or float or bool or np.ndarray, shape (seq_len, feat_dim)) – Data loaded from the file specified.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


class tts_data_tools.data_sources.StringSource(name, ext='txt')[source]

Bases: tts_data_tools.data_sources._DataSource

Loads data from a text file, this will be loaded as strings where each item should be on a new line.

  • name (str) – Name of directory that will contain this feature.

  • ext (str, optional) – The file extension of the saved features, if not set name is used.

load_file(self, base_name, data_dir)[source]

Loads lines of text.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type


save_file(self, data, base_name, data_dir)[source]

Saves text as a text file.

  • data (list<str>) – Sequence of strings.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


class tts_data_tools.data_sources.ASCIISource(name, ext='txt')[source]

Bases: tts_data_tools.data_sources.StringSource

Loads data from a text file, this will be loaded as strings where each item should be on a new line.

  • name (str) – Name of directory that will contain this feature.

  • ext (str, optional) – The file extension of the saved features, if not set name is used.

load_file(self, base_name, data_dir)[source]

Loads the lines and converts to ASCII codes (np.int8), each line is considered as a sequence item.

Each line can have a different number of characters, the maximum number of characters will be used to determine the shape of the 2nd dimension of the array.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type

np.ndarray, shape (seq_len, max_num_characters), dtype (np.int8)

save_file(self, data, base_name, data_dir)[source]

Saves ASCII codes as a text file.

  • data (np.ndarray, shape (seq_len, max_num_characters), dtype (np.int8)) – Sequence of strings stored as ASCII codes (and padded with x00).

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


class tts_data_tools.data_sources.WavSource(name, normalisation=None, use_deltas=False, sample_rate=None)[source]

Bases: tts_data_tools.data_sources._DataSource

Loads wavfiles using

  • name (str) – Name of directory that will contain this feature.

  • normalisation (str) – Type of normalisation to perform. This allows the type of normalisation to be specified, but the normaliser itself will not be contained within the data source, that must be handled outside of the data source.

  • use_deltas (bool) – Whether to compute delta features. If normalisation is being used it will also perform normalisation of deltas.


The sample rate of the wavfiles being loaded, if not given this will be set in self.load_file.



load_file(self, base_name, data_dir)[source]

Loads a wavfile using

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.


Return type

np.ndarray, shape (num_samples,), dtype (np.int16)

save_file(self, data, base_name, data_dir)[source]

Saves the feature as a wavfile using

  • data (int or float or bool or np.ndarray, shape (seq_len, feat_dim)) – Data loaded from the file specified.

  • base_name (str) – The name (without extensions) of the file to be loaded.

  • data_dir (str) – The directory containing all feature types for this dataset.

Feature normalisers


class, normalisation_dir, data_root='.', device='cpu')[source]

Bases: dict

A dictionary-like container for normalisers, provides an interface for creating the normalisers.

  • normalisation_dir (str) – The directory containing the normalisation parameters (in a JSON file).

  • data_root (str) – The directory root for this dataset.

  • device (str or torch.device) – The name of the device to place the parameters on.

create_normaliser(self, name, data_source)[source]

Creates the normaliser if one was specified for this data source.

  • name (str) – Name used to index this data source in the model.

  • data_source (_DataSource) – Specification of how to load this feature.


Return type



class, data_dir, use_deltas=False, device='cpu', data_root='.', file_pattern='{}.json')[source]

Bases: object

Abstract feature normaliser class. Exposes the normalise() and denormalise() methods.

Normalisers will work on both NumPy arrays and PyTorch tensors. This is necessary to process NumPy arrays in _DataSource.__call__() and to normalise/denormalise PyTorch tensors in batch within the model.

  • feature_name (str) – Name of the feature.

  • data_dir (str) – Directory containing all data for this dataset split.

  • use_deltas (bool) – Whether to load normalisation parameters for delta features.

  • device (str or torch.device) – Name of the device to place the parameters on.

  • data_root (str) – Directory root for this dataset.

  • file_pattern (str) – Format of the JSON file containing the normalisation parameters.

normalise(self, feature, deltas=False)[source]

Normalises the sequence feature.

  • feature (np.ndarray or torch.Tensor, shape (batch_size, seq_len, feat_dim) or (seq_len, feat_dim)) – Sequence feature to be normalised, can be a NumPy array or a PyTorch tensor, can be batched.

  • deltas (bool) – Whether feature is a delta feature, and should be normalised using the delta parameters.


Normalised sequence feature.

Return type

np.ndarray or torch.Tensor, shape (batch_size, seq_len, feat_dim) or (seq_len, feat_dim)

denormalise(self, feature, deltas=False)[source]

De-normalises the sequence feature.

  • feature (np.ndarray or torch.Tensor, shape (batch_size, seq_len, feat_dim) or (seq_len, feat_dim)) – Sequence feature to be normalised, can be a NumPy array or a PyTorch tensor, can be batched.

  • deltas (bool) – Whether feature is a delta feature, and should be normalised using the delta parameters.


Normalised sequence feature.

Return type

np.ndarray or torch.Tensor, shape (batch_size, seq_len, feat_dim) or (seq_len, feat_dim)

fetch_params(self, data_type=<class 'numpy.ndarray'>, deltas=False)[source]

Gets the normalisation parameters, taking into account the delta flag and type of data.

static load_params(feature_name, data_dir, device='cpu', file_pattern='{}.json')[source]

Loads the parameters from file and converts them to NumPy arrays and PyTorch tensors.


class, data_dir, use_deltas=False, device='cpu', data_root='.')[source]


Normalises features such that they have zero mean and unit variance.


norm_f = (f - mean) / std_dev


f = (norm_f * std_dev) + mean

  • feature_name (str) – Name of the feature.

  • data_dir (str) – Directory containing all data for this dataset split.

  • use_deltas (bool) – Whether to load normalisation parameters for delta features.

  • device (str or torch.device) – Name of the device to place the parameters on.

  • data_root (str) – Directory root for this dataset.


class, data_dir, use_deltas=False, device='cpu', data_root='.')[source]


Normalises features such that they have a minimum value of 0 and a maximum value of 1.


norm_f = (f - min) / (max - min)


f = norm_f * (max - min) + min

  • feature_name (str) – Name of the feature.

  • data_dir (str) – Directory containing all data for this dataset split.

  • use_deltas (bool) – Whether to load normalisation parameters for delta features.

  • device (str or torch.device) – Name of the device to place the parameters on.

  • data_root (str) – Directory root for this dataset.

Wrappers to change existing DataLoader instance



Bases: object

Abstract wrapper. Allows attribute reference for underlying data loader.


class, device)[source]


Wraps the __iter__ method of, mapping each batch to a given device.