1
2
With the increase in the number of electronic documents,it is hard to manually organize,analyse and present these documents efficiently.Document clustering generates clusters from the whole document collection automatically and is used in many fields,including data mining and information retrieval.A Web document restructuring scheme that identifies different document parts, and assigns levels of significance to these parts according to their importance. In this paper, we are implementing a hybrid algorithm in which the initial clusters are formed based on the frequency of terms in each of the document and the further level of clustering is done by K Nearest Neighbor algorithm. This document clustering model is used to measure the similarity between the documents using a similarity measure that makes use of term frequencies.The similarity calculation between documents is based on single-term similarity.The quality and performance of document clustering are higher than other traditional clustering methods.
Documentclustering, webmining, document similarity