Friday, April 25, 2008

[Reading] Transductive Inference for Text Classification using Support Vector Machines

This paper presents a method to expend the size of the training set for SVM. The key idea is to optimize the hyperplane and the un-seen labels together. In this way, the co-occurrence information missed in the original training data might be explored automatically.

The idea is very simple, and the algorithm for the optimization is easy to implementation (small modification to the original SVM). An interesting thing is that, if we apply pLSA to cluster the topics of all documents first, then the co-occurrence information can be found to some extent. In this case, does TSVM still outperform SVM? Or, maybe the co-occurrence is one thing found by TSVM that the author can think of, but there are other things that are hard to describe are also found by TSVM.

No comments: