Data mining is an extremely useful analytical technique for developing predictive and forecasting applications. It has made a profound impact on business practices in recent years. It is a complex process that aims to derive an accurate predictive model starting from a collection of data. The explosive growth in data warehousing and internet usage has made large amounts of data potentially available for developing predictive models. However in many data mining problem domains data is available in abundance but the cost of acquiring correct labels prohibits its use. Consequently it is of utmost importance to minimize the amount of data that is required to learn a target concept. Active learning methods attempt to select for labeling and training only the most informative examples and therefore are potentially very useful in data mining applications where labeling data is costly. This paper presents an.exploratory study of active learning methods for using a small set of labeled data together with a large supplementary unlabeled dataset in order to learn a better hypothesis than just by using the labeled information.
Active Learning, Classification, Support vector machines