Morgana documentation¶
API documentation can be found at the bottom of this page: Package reference
See the command line usage for running experiments: Command line arguments
Morgana (GitHub) (docs) is a toolkit for defining and training Text-to-Speech voices in PyTorch.
Installation¶
To install as a package, from source:
pip install git+https://github.com/ZackHodari/morgana
To clone as a git repo, allowing for local modifications to the source code:
git clone https://github.com/ZackHodari/morgana
cd morgana
python setup.py develop
Design¶
The support code necessary for creating Text-to-Speech (TTS) voices is not over complicated, but does require time to piece together. Existing packages like Merlin provide an easy to use toolkit for training voices, but Merlin obfuscates too many details, and makes modifications hard without changing the source code. Similar to nnmnkwii, Morgana aims to provide flexibility in defining your own models. However, Morgana attempts to automate as much of the support code as possible, while still allowing for customisation where necessary.
Defining a model¶
For most use-cases, the classes provided in base_models.py is all that needs to be extended. For
example, create a subclass of morgana.base_models.BaseSPSS
and implement train_data_sources
, predict
, and
loss
.
__init__
: Create the layers for your model.train_data_sources
: Define a dictionary containing instances that are subclasses ofmorgana.data._DataSource
.predict
: Define any arbitrary computation in PyTorch.loss
: Define the targets (fromfeatures
) and predictions (output ofpredict
) used to calculate the loss. A sequence length feature can also be specified (for each target-prediction pair).
import torch.nn as nn
from morgana import data
from morgana import utils
from morgana.base_models import BaseSPSS
from morgana.experiment_builder import ExperimentBuilder
class F0Model(BaseSPSS):
def __init__(self, normalisers=None):
super(F0Model, self).__init__(normalisers=normalisers)
self.layers = nn.Sequential(
nn.Linear(600, 512),
nn.Sigmoid(),
nn.Linear(512, 128),
nn.Sigmoid(),
nn.Linear(128, 32),
nn.Sigmoid(),
nn.Linear(32, 1)
)
@classmethod
def train_data_sources(cls):
return {
'n_frames': data.TextSource('n_frames'),
'dur': data.TextSource('dur', normalisation='mvn'),
'lab': data.NumpyBinarySource('lab', normalisation='minmax'),
'lf0': data.NumpyBinarySource('lf0', normalisation='mvn'),
}
def predict(self, features):
norm_lab_at_frame_rate = utils.upsample_to_repetitions(features['normalised_lab'], features['dur'])
pred_norm_lf0 = self.layers(norm_lab_at_frame_rate, seq_len=features['n_frames'])
pred_lf0 = self.normalisers['lf0'].denormalise(pred_norm_lf0)
return {'pred_norm_lf0': pred_norm_lf0,
'pred_lf0': pred_lf0}
def loss(self, features, output_features):
target_norm_lf0 = features['normalised_lf0']
pred_norm_lf0 = output_features['pred_norm_lf0']
seq_len = features['n_frames']
return self._loss(target_norm_lf0, pred_norm_lf0, seq_len)
Running an experiment¶
Most models can be run using classes provided in experiment_builder.py. If different training procedures
are needed, then a new morgana.experiment_builder.ExperimentBuilder
subclass may be required. An
ExperimentBuilder
contains the following important methods,
add_args
: Defines the command lines arguments supported for experiments of this type.__init__
:Saves command line arguments as instance attributes.
Calls
resolve_setting_conflicts
.Loads normalisers and data specified in
train_data_sources
.Creates the model. Loads from a checkpoint. Creates an exponential moving average (EMA) instance of the model.
Sets up Python logging, saves stdout and stderr to files. Saves tqdm output to a separate log file.
resolve_setting_conflicts
: Check (and modify) any command line arguments that are incorrect (or inconsistent).train_epoch
: Epoch loop that iterates throughExperimentBuilder.train_iter
.run_train
: Training loop that callstrain_epoch
untilExperimentBuilder.epoch
reachesExperimentBuilder.end_epoch
.valid_epoch
: Epoch loop that iterates throughExperimentBuilder.valid_iter
.run_valid
: Runs validation of the current model (or EMA model), and reports the validation loss.test_epoch
: Epoch loop that iterates throughExperimentBuilder.test_iter
.run_test
: Runs generation of the current model (or EMA model), no loss will be reported (no labels are given).run_experiment
: Runsrun_train
,run_valid
, andrun_test
according to the command line arguments.
At the bottom of the file containing F0Model
place the following,
def main():
args = ExperimentBuilder.get_experiment_args()
experiment = ExperimentBuilder(F0Model, **args)
experiment.run_experiment()
if __name__ == "__main__":
main()
The model can then be trained using the following command (see tts_data_tools for guidance on pre-processing of data),
python acoustic_model.py \
--experiment_name DNN_voice \
--data_root ~/data/Blizzard2017 \
--train_dir train \
--train_id_list train_file_id_list.scp \
--valid_dir valid \
--valid_id_list valid_file_id_list.scp