A study on the thereotical and application of the statistical filtering techniques for the spam classification problems has been conducted and its results are presented. The research methodology applied in this study starts with building the dataset, correcting errors in the datasetand discusses the techniques to compute the probability of tokens in the dataset and the statistical application of the token values. The analysis shown is used to depict that statistical filtering is better than heuristic-based filtering because the former approach gives specific information about making a decision. All the four popular techniques in use today namely-Bayesian Combination by Paul Graham, Bayesian Combination by Brian Burton, Robinson's Geometric Mean Test and Fisher-Robinson's Inverse Chi-Square and their merits and demerits have been discussed in detail.
Spam Classfication, Statistical Filtering, Bayes Classifier, Machine Learning