IITM JOURNAL OF MANAGEMENT AND IT

  • Year: 2011
  • Volume: 2
  • Issue: 2

Active learning for optimal use of unlabeled data

  • Author:
  • Prerna Mahajan1, Rekha Kandwal2, Ritu Vijay3
  • Total Page Count: 6
  • DOI:
  • Page Number: 9 to 14

1Associate Professor, Institute of Information Technology & Management

2India Meteorological Department

3Department of Electronics, AIM & ACT

Abstract

Data mining is an extremely useful analytical technique for developing predictive and forecasting applications. It has made a profound impact on business practices in recent years. It is a complex process that aims to derive an accurate predictive model starting from a collection of data. The explosive growth in data warehousing and internet usage has made large amounts of data potentially available for developing predictive models. However in many data mining problem domains data is available in abundance but the cost of acquiring correct labels prohibits its use. Consequently it is of utmost importance to minimize the amount of data that is required to learn a target concept. Active learning methods attempt to select for labeling and training only the most informative examples and therefore are potentially very useful in data mining applications where labeling data is costly. This paper presents an.exploratory study of active learning methods for using a small set of labeled data together with a large supplementary unlabeled dataset in order to learn a better hypothesis than just by using the labeled information.

Keywords

Active Learning, Classification, Support vector machines