A COVID-19 Corpus Creation for Bengali: In the Context of Language Study


Prasanta Mandal
Department of Computer Science and Engineering, Govt. College of Engineering and Textile Technology, 12, William Carey Road, Serampore, Hooghly-712201
Apurbalal Senapati
Department of Computer Science and Engineering, Central Institute of Technology Kokrajhar, Kokrajhar-783370


A corpus is a large collection of machine-readable texts, ideally, that should be representative of a Language. Corpus plays an important role in several natural language processing (NLP) and linguistic research. The corpus development itself is a substantial contribution to the resource building of language processing. The corpora play an important role in linguistic study as well as in several NLP tasks like Part-Of-Speech (POS) tagging, Parsing, Semantic tagging, in the parallel corpora, etc. There are numerous corpora in the literature of different languages and most of them are created for a specific purpose. Hence it is obvious that a researcher cannot use any corpus for their particular task. This paper also focuses on an automated technique to create a COVID-19 corpus dedicated to the research in linguistic aspects because of the pandemic situation.

July 12, 2021
Online ISSN