Advances in Applied Research
  • Year: 2009
  • Volume: 1
  • Issue: 1

Lemmatization and Visualization of Tamil Documents

  • Author:
  • G.T. Prabavathi
  • Total Page Count: 10
  • Page Number: 83 to 92

Lecturer (SS) in Computer Science, Department of Computer Science, Gobi Arts & Science College, Gobichettipalayam638453. Email: gtpraba@gmail.com

Online published on 11 June, 2014.

Abstract

Powerful methods for interactive exploration and search from collections of textual documents are essential to manage the ever-increasing flood of digital information. This paper deals with lemmatizing Tamil text documents and visualizing the clustered documents for faster retrieval system. Tamil - a language belonging to the south-central branch of the Dravidian languages is highly inflectional which requires huge lemmatization techniques for extracting the correct root word. The mined documents are automatically clustered onto a map in an unsupervised manner through statistical information of word contexts using self-organizing map (SOM) increasing the search efficiency in Tamil digital library collection.

Keywords

Text mining, Stemming, Lemmatization, Self-organizing maps