Advances in Computational Sciences and Technology
  • Year: 2009
  • Volume: 2
  • Issue: 3

Improving the Web Search by Document Clustering

  • Author:
  • R. Subhashini1, V. Jawahar Senthil Kumar2
  • Total Page Count: 10
  • Page Number: 387 to 396

1Sathyabama University, Chennai-119, India.

2Anna University, Chennai, India. E-mail: subhaagopi@gmail.com.

Abstract

With the increase in the number of electronic documents,it is hard to manually organize,analyse and present these documents efficiently.Document clustering generates clusters from the whole document collection automatically and is used in many fields,including data mining and information retrieval.A Web document restructuring scheme that identifies different document parts, and assigns levels of significance to these parts according to their importance. In this paper, we are implementing a hybrid algorithm in which the initial clusters are formed based on the frequency of terms in each of the document and the further level of clustering is done by K Nearest Neighbor algorithm. This document clustering model is used to measure the similarity between the documents using a similarity measure that makes use of term frequencies.The similarity calculation between documents is based on single-term similarity.The quality and performance of document clustering are higher than other traditional clustering methods.

Keywords

Documentclustering, webmining, document similarity