International Journal of Managment, IT and Engineering
  • Year: 2012
  • Volume: 2
  • Issue: 2

The problem of outliers in clustering

  • Author:
  • Thatimakula Sudha1, Swapna Sree Reddy Obili2
  • Total Page Count: 23
  • Page Number: 138 to 160

1Research Supervisor, Sri Padmavathi Women's University, Tirupati

2PhD Research Scholar, Sri Padmavathi Women's University, Tirupati

Online published on 26 June, 2013.

Abstract

Clustering has been widely used in many applications including data mining, pattern recognition and machine learning. Noise is a major problem in cluster analysis, which degrades the performance of many existing methods. This paper is aimed at solving noise problems in data clustering.

Many existing clustering algorithms are sensitive to the presence of outliers. In this paper, a new robust operator is developed to attack this problem, namely the modified l2 norm. There are many merits in using this new measure. No sensitiveuser-defined parameter is needed for this measure and it automatically assigns a small weight to the sample, which is far away from the cluster center. It is robust to outliers and has a theoretical 50% breakdown point. It can be solved without using an exhaustive search and can be extended to more general prototype, for example curve. We have tested this method with four synthetic and three real world datasets. Experiment results show that the method yields better results than other clustering algorithms.