Technical Domain Classification of Bangla Text using BERT

Authors

Koyel Ghosh
Department of Computer Science and Engineering, Central Institute of Technology, Assam
Apurbalal Senapati
Department of Computer Science and Engineering, Central Institute of Technology Kokrajhar, Kokrajhar-783370

Synopsis

Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in NLP, and these tasks are done on document and sentence levels. Here, our goal is to identify the technical domain of a given Bangla text. In Coarse-grained technical domain classification, such a piece of the Bangla text provides information about specific Coarse-grained technical domains like Biochemistry (bioche), Communication Technology (com-tech), Computer Science (cse), Management (mgmt), Physics (phy) Etc. This paper uses a recent deep learning model called the Bangla Bidirectional Encoder Representations Transformers (Bangla BERT) mechanism to identify the domain of a given text. Bangla BERT (Bangla-Bert-Base) is a pretrained language model of the Bangla language. Later, we discuss the Bangla BERT accuracy and compare it with other models that solve the same problem.

ICTCon2021
Published
July 12, 2021
Online ISSN
2582-3922