End-to-End Speech Recognition Using Recurrent Neural Network (RNN)

Rene Avalloni de Morais; Baidya Nath  Saha

End-to-End Speech Recognition Using Recurrent Neural Network (RNN)

Authors

Rene Avalloni de Morais

Department of Mathematical and Physical Sciences, Concordia University of Edmonton, Alberta, T5B 4E4

Baidya Nath Saha

Department of Mathematical and Physical Sciences, Concordia University of Edmonton, Alberta, T5B 4E4

DOI: https://doi.org/10.21467/proceedings.115.20

Synopsis

Deep learning algorithms have received dramatic progress in the area of natural language processing and automatic human speech recognition. However, the accuracy of the deep learning algorithms depends on the amount and quality of the data and training deep models requires high-performance computing resources. In this backdrop, this paper adresses an end-to-end speech recognition system where we finetune Mozilla DeepSpeech architecture using two different datasets: LibriSpeech clean dataset and Harvard speech dataset. We train Long Short Term Memory (LSTM) based deep Recurrent Neural Netowrk (RNN) models in Google Colab platform and use their GPU resources. Extensive experimental results demonstrate that Mozilla DeepSpeech model could be fine-tuned for different audio datasets to recognize speeches successfully.