Speech to Speech Translation for English and Hindi with Speaker Preservation

Authors

Dhruv Prasanna
Computer Science Engineering Dept, PES University, 100 Feet Ring Road, 560085, Bangalore, Karnataka
Avinash Nithyashree
Computer Science Engineering Dept, PES University, 100 Feet Ring Road, 560085, Bangalore, Karnataka
Namith V Shetty
Computer Science Engineering Dept, PES University, 100 Feet Ring Road, 560085, Bangalore, Karnataka
Praharsha Kosuri
Computer Science Engineering Dept, PES University, 100 Feet Ring Road, 560085, Bangalore, Karnataka
Pavan A C
Computer Science Engineering Dept, PES University, 100 Feet Ring Road, 560085, Bangalore, Karnataka

Synopsis

This paper presents an advanced speech to speech translation system designed to facilitate accurate communication between English and Hindi speakers with near real time responses while preserving the original voice of the speaker. The system uses a cascaded architecture consisting of Automatic Speech Recognition (ASR), Machine Translation (MT), and Text to Speech (TTS) components. The resulting system is able to accurately translate between English speech and Hindi speech and vice versa. The techniques shown attempt to tackle the difficulties brought on by the different language structures and phonetic differences between Hindi and English by making use of transformer based models in each module. The presented system is capable of providing accurate translations and performs on par with state of the art models and services like Google Translate and ChatGPT. HuBERT, a speech representation model is utilized to perform voice cloning on the voice of the target speaker, this allows the system to preserve the speakers voice while translating which helps more effective communication. HuBERT enhances clarity and emotional realism in TTS by leveraging speaker specific attributes extracted from the original speech to synthesize the translated material in a similar voice to the original speaker.

ICAMC2024
Published
March 17, 2025
Online ISSN
2582-3922