A Survey on Collaborative Learning Approach for Speech and Speaker Recognition
Synopsis
The Multi-task recurrent neural net model delivers improved performance on both automatic speech and speaker recognition. Neural networks are prognosticating methods that are based on simple mathematical models of the brain. They allow complex nonlinear relationships between the response variable and its predictors. A deep learning approach has been used to derive speaker identities (d-vector) by a Deep Neural Network (DNN). A DNN is an Artificial Neural Network (ANN) with multiple hidden layers between the input and output layers. In the DNN, the hidden layers can be considered as increasingly complex feature transformations. The final softmax layer is a log-linear classifier which makes use of the abstract features computed in the hidden layers. Long Short-Term Memory (LSTM) is specific recurrent neural network (RNN) architecture. This paper analyzes the various approaches for the training of speech and speaker recognisation by identifying the factors like target delay, partially marked data, and negatively-correlated tasks.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.