Zack Hodari | Homepage

About Me

I am a researcher with 8 years of experience in generative AI, currently working on speech-to-speech translation at Papercup. Our mission is make the world's videos watchable in any language! I'm focused on delivering real-world impact through my research. My background is in speech synthesis and have experience in NLP, LLMs, and computer vision.

Research

My research focuses on making synthetic voices speak like humans, this means conveying the same intentions, feelings, and attitudes to communicate effectively. These aspects of how we speak are called prosody. During my PhD at the University of Edinburgh, I focused on how we lack the contextual information required to make appropriate prosodic choices. My more recent work focuses on using additional context to generate appropriate prosody. I work on generative models, controllable machine learning, representation learning, interpretability, and experimental design.

Blog Posts

Blog Highlights of NeurIPS, New Orleans 2023.

Blog Favourite papers at Interspeech, Seoul 2022.

Blog Speech synthesis at Interspeech, Seoul 2022.

Blog Summary of UK Speech conference, Edinburgh 2022.

Blog Summary of Speech and Language Technology workshop, Sheffield 2022.

slides.drive 11 part Lecture Series on Machine Learning for Speech and Language Processing masters course. I designed the syllabus for this, created all lecture content, and gave the lecture series for students in 2021.

slides.drive Lecture notes for Probabilistic Modelling and Reasoning (PMR) masters course. As the teaching assistant I created course notes to supplement the course material.

Tutorial From linear regression to RNNs and CNNs, including animated explainer for all CNN variants. Tutorial was aimed at Speech and Language Processing masters students at The University of Edinburgh.

Highlighted Talks

I have a passion for presenting my ideas to others, including both technical and lay audiences. I've given many invited talks, including keynote presentations.

Keynote talk at Sheffield Speech Synthesis Workshop (2023): Prosody - The only important problem in TTS.
slides.google
University finalist in 3 Minute Thesis (2021): How to speak like a human.
video.html
Invited talk at ISCA SIGML seminar series (2021) on my PhD research: Synthesising prosody with insufficient context.
slides.google
Invited talk at KTH TMH seminar series (2021) on my PhD research: Synthesising prosody with insufficient context.
slides.google
Invited talk at Papercup (2020): Perception of learned intonation contour classes.
slides.google

Publications

Dan Andrei Iliescu, Devang Savita Ram Mohan, Tian Huey Teh, Zack Hodari, (2024) Controllable Prosody Generation With Partial Inputs In Proceedings of ICASSP, Seoul, Korea, 2024.
arxiv.html paper.pdf poster.pdf demo.html
Tian Huey Teh, Vivian Hu, Devang Ram Mohan, Zack Hodari, Christopher Wallis, Tomás Gómez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King, (2023) Ensemble Prosody Prediction For Expressive Speech Synthesis In Proceedings of ICASSP, Rhodes, Greece, 2023, pp. 1-5.
arxiv.html paper.pdf
Zack Hodari (2017) Synthesising Prosody with Insufficient Context. PhD thesis, University of Edinburgh.
Summary: Generating speech is difficult as the intonation, rhythm, emotion, and expressivity of our voice is implicit and not defined by the text. I worked on generative models, LLMs, discrete representation learning, interpretability, and controllability to improve the expressivity of synthetic voices.
abstract.html thesis.pdf slides.google
Zack Hodari, Alexis Moinet, Sri Karlapati, Jaime Lorenzo-Trueba, Thomas Merritt, Arnaud Joly, Ammar Abbas, Penny Karanasou, Thomas Drugman, (2021) CAMP: a two-stage approach to modelling prosody in context. In Proceedings of ICASSP, Toronto, Canada, 2021, pp. 6578-6582.
arxiv.html paper.pdf slides.pptx
Sri Karlapati, Ammar Abbas, Zack Hodari, Alexis Moinet, Arnaud Joly, Penny Karanasou, Thomas Drugman, (2021) Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech. In Proceedings of ICASSP, Toronto, Canada, 2021, pp. 6573-6577.
arxiv.html paper.pdf
Zack Hodari, Catherine Lai, Simon King, (2020) Perception of prosodic variation for speech synthesis using an unsupervised discrete representation of F0. In Proceedings of Speech Prosody, Tokyo, Japan, 2020, pp. 965-969.
bib.html paper.pdf slides.google video.mp4
Zack Hodari, Oliver Watts, Simon King, (2019) Using generative modelling to produce varied intonation for speech synthesis. In Proceedings of Speech Synthesis Workshop, Vienna, Austria, 2019, pp. 239-244.
This work was also presented at UK Speech, Birmingham, UK, 2019.
bib.html paper.pdf slides.google poster.pdf
Jason Fong, Pilar Oplustil Gallegos, Zack Hodari, Simon King, (2019) Investigating the robustness of sequence-to-sequence text-to-speech models to imperfectly-transcribed training data. In Proceedings of Interspeech, Graz, Austria, 2019, pp. 1546-1550.
bib.html paper.pdf poster.pdf
Zack Hodari, Oliver Watts, Srikanth Ronanki, Simon King, (2018) Learning interpretable control dimensions for speech synthesis by using external data. In Proceedings of Interspeech, Hyderabad, India, 2018, pp. 32-36.
bib.html paper.pdf slides.pptx poster.pdf
Zack Hodari, Simon King, (2017) A learned emotion space for emotion recognition and emotive speech synthesis. In Proceedings of UK Speech, Cambridge, UK, 2017.
bib.html poster.pdf
Zack Hodari (2017) A Learned Emotion Space for Emotion Recognition and Emotive Speech Synthesis. MScR thesis, University of Edinburgh.
Summary: My MScR research focussed on learning a description of emotion to improve upon issues with current labelling techniques. In addition, I evaluated the proposed technique using style adaptation for speech synthesis.
thesis.pdf poster.pdf

Personal

When I'm not staring at a computer screen, I enjoy board games, hiking, travelling, cooking, baking, woodworking, juggling, and whisky.