Technical Domain Classification of Bangla Text using BERT

Koyel Ghosh; Apurbalal Senapati

Technical Domain Classification of Bangla Text using BERT

Authors

Koyel Ghosh

Department of Computer Science and Engineering, Central Institute of Technology, Assam

Apurbalal Senapati

Department of Computer Science and Engineering, Central Institute of Technology Kokrajhar, Kokrajhar-783370

DOI: https://doi.org/10.21467/proceedings.115.16

Synopsis

Coarse-grained tasks are primarily based on Text classification, one of the earliest problems in NLP, and these tasks are done on document and sentence levels. Here, our goal is to identify the technical domain of a given Bangla text. In Coarse-grained technical domain classification, such a piece of the Bangla text provides information about specific Coarse-grained technical domains like Biochemistry (bioche), Communication Technology (com-tech), Computer Science (cse), Management (mgmt), Physics (phy) Etc. This paper uses a recent deep learning model called the Bangla Bidirectional Encoder Representations Transformers (Bangla BERT) mechanism to identify the domain of a given text. Bangla BERT (Bangla-Bert-Base) is a pretrained language model of the Bangla language. Later, we discuss the Bangla BERT accuracy and compare it with other models that solve the same problem.