Saturday, June 7, 2008

[Reading] Support vector learning for ordinal regression

This paper extends the SVM framework for the ordinal regression (classification of ordered classes). The algorithm is easy to understand after the problem is formulated. The key contribution of the paper is showing that the ordinal regression problem can be transformed into regular SVR by using the difference of the feature vectors as the new feature vector. Also the error bound of the algorithm is presented.

I did not know much about SVM and hope I can find some useful online materials to learn it by myself in the future....

Friday, May 30, 2008

[Reading] The pagerank citation ranking: Bringing order to the web

This is a very famous paper. Almost all people who searched for the answer of "how Google works" will reach this paper. The basic idea is unbelievably simple, just a simple application of the random walk theory. Given an initial positions of many walkers on the web, the final distribution of these walkers is the importance of the web pages. Thanks to the AMMAI course, I can easily think that this is a linear algebra problem.

Because the number of the webpages is very large and we only want one eigenvector, an iterative method is used in the paper. An interesting thing is that the iteration step is not the gradient of the cost function but the authors claim that the convergence is faster. Another issue is that the importance of webpages is l1-normalized. However, I don't really know why this is necessary.

It's interesting to think, how pagerank works in the environment of Web 2.0 now (of course Google must handle this already...). The webpages are now very dynamic. The link structures change very fast and it's almost impossible to build the full linear system (there are billions of webpages now, and each one can easily link to hundred of webpages). Actually maybe Google do not work so well now...at least recently I feel depressed at searching on Google :P

Sunday, May 25, 2008

[Reading] Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

This paper presents a new method for mining frequent patterns. The main technique is a new data structure, FP tree, for the compact representation of the database. A FP tree can compactly represent all frequent patterns (no information loss) and enable fast query (by traversing the tree and using some range pointers).

There are always some detailed examples in describing the data structure and mining methods, usually appear before the proofs of the lemmas. This makes the paper very easy to understand. Also all pseudo codes are provides and thus it should be simple to implement the algorithm. Even the FP tree always greatly compresses the database. the authors still provide a projection method to reduce the required memory.

Because this method makes no approximation to the problem, the results are identical to the ground truth, and obtained much faster than the previous methods. In summary, this is a very exciting work !

[Reading] Mining Association Rules between Sets of Items in Large Databases

This paper presents a system to find the association rules from a large number of transaction records. At the time of publication, the problem and the solution of this field might be very twilit and this should be a very pioneering work.

After properly formulating the problem, the problem can be decomposed into 2 sub-problems, and the trivial solutions become obvious. However, the straightforward algorithm is too expensive, and thus the authors provide many acceleration methods. I personally like the analysis at the memory analysis part.

The main difficulty in reading this paper is that the writing style is very old (or at least very different to the papers in my field). Also the results are obtained from a dataset without ground truth.

Saturday, May 17, 2008

[Reading] Normalized Cuts and Image Segmentation

I read this paper long time ago...and I skipped main derivation at that time :P. The key contribution of this paper is, the authors use an specified energy function to measure the quality of the segment, and then show that this energy minimization problem can be solved by the generalized eigensystem, which can be linked to the spectral graph theory. In my opinion, the forward way (normalized cut --> eigensystem) is much harder to derive than the backward way. If we think in the backward way, we may say that this is another way to interpret the spectral clustering.

Hope the presenters next week can give us a more insightful view to the spectral graph theory...I was not good at linear algebra.

Tuesday, May 13, 2008

[Reading] On Spectral Clustering: Analysis and An Algorithm

This paper presents a simple algorithm of clustering, based on the spectral theory. Compare with other spectral clustering methods (such as normalized cut), the algorithm is much easier to understand and implement. The key idea is that, unlike previous methods, many eigenvectors are used instead of only one.

For the trivial case, it's easy to understand that the eigenspace perfectly represents the clustering result. However, for the general case, the analysis is difficult to understand (at least to me). The authors make 4 assumptions, and also show those assumptions are reasonable in the general cases. Under these assumptions, the performance of the algorithm is guaranteed.

In summary, I seldom like the papers from NIPS....too hard for me...

Tuesday, April 29, 2008

[Reading] An Introduction to Graphical Models

This is a very good introduction paper for the graphical model. It elegantly explains why and when we should use the graphical model, what operations are usually used in the graphical model (so you may map your problem into one of them), and what tools are available to run these operations efficiently.

One main reason to use the graphical model is efficient computing. By explicitly expressing the dependency between the variables, the number of summation in inference and decision making can be greatly reduced. Also the approximate inference algorithms can be better understood. The message passing and graph cut methods all direct operate on the nodes and edges of the graphical model.

The paper also mentioned a topic that I didn't enter before, learning the structure of the graphical model. However, I personally think this is an aesthetic problem instead of a computational problem, since the search space is almost infinite and the goal (e.g., suitable for inference or decision making) is hard to formulate as an objective function. Even a rough graphical model is estimated, manual effort is still required to beautify its structure.