International Journal of Data Mining and Emerging Technologies
  • Year: 2012
  • Volume: 2
  • Issue: 1

Incorporating Fuzziness in Text Classification

1Associate Professor, Sreenidhi Institute of Science & Technology, Ghatkesar, Hyderabad, India

2Professor & Head, Computer Science Engineering Department, Vasavi College of Engineering, Ibrahimbagh, Hyderabad, India

*Email id: wajeed.mtech@gmail.com

**Email id: t_adilakshmi@rediffmail.com

Abstract

To make living comfortable human has adapted electronic gadgets as part of the life, which generates bulk amount of data. The data can be in free flow textual form, in structured form or in semi-structured; HTML pages are example of semi-structured form. To perform efficient retrieval of the stored data, the data needs to be classified into different categories. Much work in data classification for structured data exists in literature. But not much work is done in unstructured data classification. The paper attempts to perform text classification, which is associated with feature selection and feature reduction. Feature selection is a process to filter out the unwanted, irrelevant features; feature reduction identifies the redundant features and eliminates them. The present paper explores the technique of combining the features into clusters based on fuzzy similarities which is characterised by a membership function taking into account the statistical mean and deviation of the words in the cluster, in the process of feature reduction, which is a nightmare in text classification problems. In order to reduce the features, representative of the cluster is considered rather all the features involved in it which would result in feature reduction to a large extent. KNN classifier to build the model using the clusters obtained was employed. The experimental results obtained were found to be encouraging.

Keywords

Unstructured Data Classification, Clusters, Fuzzy Logic, Hard Classification, Soft Classification, Similarity Measures, Euclidean Distance