A COVID-19 Corpus Creation for Bengali: In the Context of Language Study
A corpus is a large collection of machine-readable texts, ideally, that should be representative of a Language. Corpus plays an important role in several natural language processing (NLP) and linguistic research. The corpus development itself is a substantial contribution to the resource building of language processing. The corpora play an important role in linguistic study as well as in several NLP tasks like Part-Of-Speech (POS) tagging, Parsing, Semantic tagging, in the parallel corpora, etc. There are numerous corpora in the literature of different languages and most of them are created for a specific purpose. Hence it is obvious that a researcher cannot use any corpus for their particular task. This paper also focuses on an automated technique to create a COVID-19 corpus dedicated to the research in linguistic aspects because of the pandemic situation.
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.