Nepali Text-to-Speech Synthesis Using Tacotron2 and WaveGlow

Ashma Rai; Shikshya Shiwakoti; Swostika Basukala; Suramya Sharma Dahal

doi:10.3126/kjse.v8i1.69276

Authors

Ashma Rai
Shikshya Shiwakoti
Swostika Basukala
Suramya Sharma Dahal

DOI:

https://doi.org/10.3126/kjse.v8i1.69276

Keywords:

Fine-tuning, Text-to-Speech, Synthesis, Tacotron2, WaveGlow

Abstract

This research paper presents the development of a Nepali Text-to-Speech (TTS) system under low-resource conditions by adapting pre-trained English Tacotron2 and WaveGlow models. Tacotron2 has been utilized for spectrogram generation, and WaveGlow has been employed for vocoding, with recognition of the pivotal role played by these components in determining the efficacy of a Text-to-Speech (TTS) system. Our approach entails the adaptation of a pre-trained English Tacotron2 model and WaveGlow architecture to Nepali, leveraging limited data resources to craft a Nepali TTS system capable of producing natural-sounding output under low-resource conditions. Through fine-tuning with a Nepali text corpus aligned with its corresponding audio dataset, the pre-trained Tacotron2 model is optimized for spectrogram generation. Subsequently, WaveGlow, our chosen audio synthesis model, is utilized to convert the spectrogram representations into audible waveforms. It is worth noting that our model exhibits limitations in synthesizing audio for a restricted subset of Nepali texts, attributed to challenges stemming from text cleaning and normalization inadequacies.

Downloads

Download data is not yet available.

Abstract

300

PDF

301

Author Biographies

Ashma Rai

Dept of Electronics and Computer Engineering, Thapathali Campus, IOE, TU

Shikshya Shiwakoti

Dept of Electronics and Computer Engineering, Thapathali Campus, IOE, TU

Swostika Basukala

Dept of Electronics and Computer Engineering, Thapathali Campus, IOE, TU

Suramya Sharma Dahal

Associate Professor, Dept of Electronics, Communication & Information Engineering, Kathmandu Engineering College

Nepali Text-to-Speech Synthesis Using Tacotron2 and WaveGlow

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Ashma Rai

Shikshya Shiwakoti

Swostika Basukala

Suramya Sharma Dahal

Downloads

Published

How to Cite

Issue

Section

Current Issue

Information