Lecturer (SS) in Computer Science,
Powerful methods for interactive exploration and search from collections of textual documents are essential to manage the ever-increasing flood of digital information. This paper deals with lemmatizing Tamil text documents and visualizing the clustered documents for faster retrieval system. Tamil - a language belonging to the south-central branch of the Dravidian languages is highly inflectional which requires huge lemmatization techniques for extracting the correct root word. The mined documents are automatically clustered onto a map in an unsupervised manner through statistical information of word contexts using self-organizing map (SOM) increasing the search efficiency in Tamil digital library collection.
Text mining, Stemming, Lemmatization, Self-organizing maps