A Comparitive Study of E-Mail Spam Detection using Various Machine Learning Techniques

Simarjeet Kaur; Meenakshi Bansal; Ashok Kumar Bathla

A Comparitive Study of E-Mail Spam Detection using Various Machine Learning Techniques

Authors

Simarjeet Kaur

Computer Science and Engineering Yadawindra College of Engineering Talwandi Sabo, India

Meenakshi Bansal

Computer Science and Engineering Yadawindra College of Engineering Talwandi Sabo, India

Ashok Kumar Bathla

Computer Science and Engineering Yadawindra College of Engineering Talwandi Sabo, India

DOI: https://doi.org/10.21467/proceedings.114.56

Synopsis

Due to the rise in the use of messaging and mailing services, spam detection tasks are of much greater importance than before. In such a set of communications, efficient classification is a comparatively onerous job. For an addressee or any email that the user does not want to have in his inbox, spam can be defined as redundant or trash email. After pre-processing and feature extraction, various machine learning algorithms were applied to a Spam base dataset from the UCI Machine Learning repository in order to classify incoming emails into two categories: spam and non-spam. The outcomes of various algorithms have been compared. This paper used random forest, naive bayes, support vector machine (SVM), logistic regression, and the k nearest (KNN) machine learning algorithm to successfully classify email spam messages. The main goal of this study is to improve the prediction accuracy of spam email filters.