Tuesday, April 29, 2008

[Reading] An Introduction to Graphical Models

This is a very good introduction paper for the graphical model. It elegantly explains why and when we should use the graphical model, what operations are usually used in the graphical model (so you may map your problem into one of them), and what tools are available to run these operations efficiently.

One main reason to use the graphical model is efficient computing. By explicitly expressing the dependency between the variables, the number of summation in inference and decision making can be greatly reduced. Also the approximate inference algorithms can be better understood. The message passing and graph cut methods all direct operate on the nodes and edges of the graphical model.

The paper also mentioned a topic that I didn't enter before, learning the structure of the graphical model. However, I personally think this is an aesthetic problem instead of a computational problem, since the search space is almost infinite and the goal (e.g., suitable for inference or decision making) is hard to formulate as an objective function. Even a rough graphical model is estimated, manual effort is still required to beautify its structure.

Friday, April 25, 2008

[Reading] Combining Labeled and Unlabeled Data with Co-Training

This paper describes a method to utilize the unlabeled data for training. The basic idea is to split the features into two separate sets, and learn two individual classifiers, and use these classifiers to expend the training data to learn new classifiers.

However, the paper is a little difficult to understand. Their method implies that the co-occurrence in-between or within the feature sets may possibly be found by the new labeled data. However, maybe these co-occurrence can be found by unsupervised clustering (like pLSA). It always involves the parameter setting. Picking too many unlabeled data or reducing too many dimensions can both degrade the performance.

[Reading] Transductive Inference for Text Classification using Support Vector Machines

This paper presents a method to expend the size of the training set for SVM. The key idea is to optimize the hyperplane and the un-seen labels together. In this way, the co-occurrence information missed in the original training data might be explored automatically.

The idea is very simple, and the algorithm for the optimization is easy to implementation (small modification to the original SVM). An interesting thing is that, if we apply pLSA to cluster the topics of all documents first, then the co-occurrence information can be found to some extent. In this case, does TSVM still outperform SVM? Or, maybe the co-occurrence is one thing found by TSVM that the author can think of, but there are other things that are hard to describe are also found by TSVM.

Friday, April 18, 2008

[Reading] Improved boosting algorithms using confidence-rated predictions

This paper completes the Adaboost algorithm. In this original Adaboost paper, only the basic concept, combining several weak classifiers into a strong classifier. In this paper, the authors present a more complete story of this technique. It shows that several parameters in Adaboost can be optimized to improve the overall performance. Also it shows several extensions of Adaboost to handle non-binary classification.

However, some of these extensions looks very natural and necessary. In my previous experiments, if the alpha is not optimized, the result of Adaboost can be very bad. Also, this paper is not the best one for the beginner like me...I find some on-line documents are better.

Thursday, April 17, 2008

[Reading] Rapid object detection using a boosted cascade of simple features

This paper describes the most well-known face detection algorithm. Before reading this paper, I thought Adaboost is equivalent to the cascade detection. The cascade detection is an important contribution of this paper.

Because the Adaboost can fuse several weak classifiers, the authors found that using simple Haar-like features is enough for the face detection, and Haar-like features can be efficiently evaluated by integral image (in fact, simple sum-area table in texture mapping). Also, the cascade detection can skip many regions to reduce the computation.

These contributions combine a very powerful system. In fact, many face detection modules in the digital cameras are based on these techniques.