**Authors: Francesco De Comité, François Denis and
Fabien Letouzey**.

**Source: ***Lecture Notes in Artificial Intelligence* Vol. 1720,
1999, 219 - 230.

**Abstract.**
In many learning problems, labeled examples are rare or expensive while
numerous unlabeled and positive examples are available. However, most learning
algorithms only use labeled examples. Thus we address the problem of learning
with the help of positive and unlabeled data given a small number of labeled
examples. We present both theoretical and empirical arguments showing that
learning algorithms can be improved by the use of both unlabeled and positive
data. As an illustrating problem, we consider the learning algorithm from
statistics for monotone conjunctions in the presence of classification noise and
give empirical evidence of our assumptions. We give theoretical results for the
improvement of Statistical Query learning algorithms from positive and unlabeled
data. Lastly, we apply these ideas to tree induction algorithms. We modify the
code of C4.5 to get an algorithm which takes as input a set LAB of labeled
examples, a set POS of positive examples and a set UNL of unlabeled data and
which uses these three sets to construct the decision tree. We provide
experimental results based on data taken from UCI repository which confirm
the relevance of this approach.

©Copyright 1999 Springer-Verlag