Friday, May 30, 2008

[Reading] The pagerank citation ranking: Bringing order to the web

This is a very famous paper. Almost all people who searched for the answer of "how Google works" will reach this paper. The basic idea is unbelievably simple, just a simple application of the random walk theory. Given an initial positions of many walkers on the web, the final distribution of these walkers is the importance of the web pages. Thanks to the AMMAI course, I can easily think that this is a linear algebra problem.

Because the number of the webpages is very large and we only want one eigenvector, an iterative method is used in the paper. An interesting thing is that the iteration step is not the gradient of the cost function but the authors claim that the convergence is faster. Another issue is that the importance of webpages is l1-normalized. However, I don't really know why this is necessary.

It's interesting to think, how pagerank works in the environment of Web 2.0 now (of course Google must handle this already...). The webpages are now very dynamic. The link structures change very fast and it's almost impossible to build the full linear system (there are billions of webpages now, and each one can easily link to hundred of webpages). Actually maybe Google do not work so well now...at least recently I feel depressed at searching on Google :P

No comments: