SPAM Detection Analysis

Samuel Arellano, Daniel Clark, Dhyan Shah, Chandler Vaughn

PDF Download Below


The use of email has become the standard for communication in business and personal settings all over the world. As it’s coming to take the place of traditional mail services, one side effect is the constant wave of unwanted messages that we receive daily. Spam email has become ubiquitous of our email experience and can lead to user frustration when it overshadows valid emails in one’s inbox. Email servers have employed spam detection methods to help automatically detect spam emails and filter them from the user. At the same time, spam email writers are becoming increasingly savvy in breaching these detection methods. In this paper, we will explore a dataset of spam and valid emails and use a recursive partitioning method to learn from a training set and classify unlabeled data in a testing set. After some tuning, we were able to accurately classify emails as spam and valid while also identifying that forwards, perCaps and bodyCharCt were the key factors in deciphering a spam vs valid email.