Co-occurrence Based Approach for Differentiation of Speech and Song

Authors

Arijit Ghosal
Department of Information Technology, St. Thomas’ College of Engineering and Technology, Kolkata, West Bengal
Ranjit Ghoshal
Department of Information Technology, St. Thomas’ College of Engineering and Technology, Kolkata, West Bengal

Synopsis

Discrimination of speech and song through auditory signal is an exciting topic of research. Preceding efforts were mainly discrimination of speech and non-speech but moderately fewer efforts were carried out to discriminate speech and song. Discrimination of speech and song is one of the noteworthy fragments of automatic sorting of audio signal because this is considered to be the fundamental step of hierarchical approach towards genre identification, audio archive generation. The previous efforts which were carried out to discriminate speech and song, have involved frequency domain and perceptual domain aural features. This work aims to propose an acoustic feature which is small dimensional as well as easy to compute. It is observed that energy level of speech signal and song signal differs largely due to absence of instrumental part as a background in case of speech signal. Short Time Energy (STE) is the best acoustic feature which can echo this scenario. For precise study of energy variation co-occurrence matrix of STE is generated and statistical features are extracted from it. For classification resolution, some well-known supervised classifiers have been engaged in this effort. Performance of proposed feature set has been compared with other efforts to mark the supremacy of the feature set.

ICTCon2021
Published
July 12, 2021
Online ISSN
2582-3922