Friday, April 25, 2008

[Reading] Combining Labeled and Unlabeled Data with Co-Training

This paper describes a method to utilize the unlabeled data for training. The basic idea is to split the features into two separate sets, and learn two individual classifiers, and use these classifiers to expend the training data to learn new classifiers.

However, the paper is a little difficult to understand. Their method implies that the co-occurrence in-between or within the feature sets may possibly be found by the new labeled data. However, maybe these co-occurrence can be found by unsupervised clustering (like pLSA). It always involves the parameter setting. Picking too many unlabeled data or reducing too many dimensions can both degrade the performance.

No comments: