The following speech samples relate to a paper submitted for review at the Speech Synthesis Workshop 2019
Systems are as follows (as described in the paper),
| System name | System description |
| RNN | Standard RNN-based SPSS model, using MSE. |
| MDN | MDN with 4 mixture components, using NLL. |
| VAE–MEAN | VAE decoder using zMEAN, i.e. the zero vector. |
| VAE–TAIL | VAE decoder using zTAIL with r = 3, i.e. points sampled uniformly on the surface of a hyper-sphere with radius 3. |
| COPY–SYNTH | Natural F0. |
| BASELINE | A quadratic polynomial fitted to natural F0. |
| RNN–SCALED | F0 from RNN, scaled vertically by a factor of 3. |
| COPY–SYNTH | BASELINE | RNN–SCALED | |||||
| RNN | MDN | VAE–MEAN | |||||
| VAE–TAIL(1) | VAE–TAIL(2) | VAE–TAIL(3) | VAE–TAIL(4) |
| COPY–SYNTH | BASELINE | RNN–SCALED | |||||
| RNN | MDN | VAE–MEAN | |||||
| VAE–TAIL(1) | VAE–TAIL(2) | VAE–TAIL(3) | VAE–TAIL(4) |
| COPY–SYNTH | BASELINE | RNN–SCALED | |||||
| RNN | MDN | VAE–MEAN | |||||
| VAE–TAIL(1) | VAE–TAIL(2) | VAE–TAIL(3) | VAE–TAIL(4) |
| COPY–SYNTH | BASELINE | RNN–SCALED | |||||
| RNN | MDN | VAE–MEAN | |||||
| VAE–TAIL(1) | VAE–TAIL(2) | VAE–TAIL(3) | VAE–TAIL(4) |