Chia-Kai's AMMAI: February 2008

Tuesday, February 26, 2008

[List] Research on Feature Points

This list is ordered in topics (may not be correct), from old to new. It should be noted that this list is far from completed and I will try to update it from time to time. A good overview of the field of feature detection can be find at Wikipedia (link).

[Parameter and Description Improvement and Optimization]

Y. Ke and R. Sukthankar. PCA-SIFT: A More Distinctive Representation for Local Image Descriptors. CVPR2004. (link)
H. Bay, T. Tuytelaars, L. Van Gool, SURF: Speeded Up Robust Features, ECCV2006. (link)
S. Winder and M. Brown, Learning Local Image Descriptors, CVPR2007. (link)
G. Hua, M. Brown and S. Winder, Discriminant Embedding for Local Image Descriptors, ICCV2007. (link)

[Efficient Matching]

K. Grauman and T. Darrell. The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features. ICCV2005. (link)
K. Grauman and T. Darrell. Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences. CVPR2007. (link)

[Visual Word]

J. Sivic and A. Zisserman. Video Google: Efficient Visual Search of Videos Toward Category-Level Object Recognition. CVPR 2003. (link)
D. Nistér and H. Stewénius. Scalable Recognition with a Vocabulary Tree. CVPR2006. (link)
J. Philbin, O. Chum, M. Isard, J. Sivic and A. Zisserman. Object Retrieval with Large Vocabularies and Fast Spatial Matching. CVPR 2007. (link)
G. Schindler, M. brown, and R. Szeliski. City-Scale Location Recognition. CVPR 2007. (link)
O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval. ICCV2007. (link)
P. Quelhas, F. Monay, J.-M. Odobez, and D. Gatica-Perez, A Thousand Words in a Scene. PAMI2007. (link)

[Scene Reconstruction]

M. Brown and D. G. Lowe. Unsupervised 3D Object Recognition and Reconstruction in Unordered Datasets. 3DIM2005 (link)
N. Snavely, Ｓ. M. Seitz, and R. Szeliski, Photo Tourism: Exploring Photo Collections in 3D. SIGGRAPH2006. (link)
M. Goesele, N. Snavely, B. Curless, H. Hoppe, S. M. Seitz, Multi-View Stereo for Community Photo Collections. ICCV2007 (link)
I. Simon, N. Snavely, and S. M. Seitz. Scene Summarization for Online Image Collections. ICCV2007. (link)

[Others]

L. Kennedy, M. Naaman, S. Ahern, R. Nair, and T. Rattenbury. How Flickr Helps us Make Sense of the World: Context and Content in Community-Contributed Media Collections. ACM MM2007. (link)

--
Last update: 02/27/2007

[Reading] Distinctive Image Features from Scale-Invariant Keypoints

I read and implemented this paper long time ago (2006) for a project in the course 'digital visual effect'. I use SIFT to register multiple images into a panorama:

(click the image to the project page, including more results and my SIFT code)

The paper presents a keypoint localization and description method, SIFT. Though these two parts can be considered independently, the author did not discuss this issue much. The keypoint keypoint localization is done by detecting the local extrema in the scale-scale representation of the image, and the description is a processed local gradient histogram around the keypoint.

The concept of these two steps are not hard to understand, but the implementation details are insanely complex. There are many parameters in the system, and the author found the possibly optimal values by experiments (these experiments are extended recently and the related papers are listed here). A large portion of the paper is filled with the figures in tuning these parameters. Even though, many implementation details are skipped and it is very difficult to write a SIFT program identical to the author's version.

Nevertheless, SIFT is a huge success and hence enables many new applications (also here). It is designed only for scale- and illumination-invariant, but, however, it is affine-invariant to some degree and is much faster than those truly affine-invariant keypoint detection methods. Although there are numerous new keypoint localization and description methods every years, I still prefer SIFT due to its efficiency, conciseness (the theory part). Most important of all, many small drawbacks in SIFT bring many interesting research topics.

Monday, February 25, 2008

[Reading] Image Retrieval: Ideas, Influences, and Trends of the New Age

This is a very long paper (66 pages, and 12 of them are references). The authors try to mention all new topics in the CBIR field after the millennium. This restriction, however, sometimes hinders the readability since all the previous work before 2000 are ignored and the readers have to check the older overview papers.

The paper contains 4 main topics: the real-world demand for CBIR, the core techniques, the offshoots, and the evaluation-related issues. For the first topic, the authors use 2 3D cubes to describe the possible behaviors of the user and of the system. Although the possible space is large, I personally think some cases should be ruled out in research. For example, designing a efficient system for a user with the intent "browser" is meaningless. All considerations about the system mentioned in the paper are very critical. Each design choice can largely affect the performance and cost. It's interesting that the authors do agree that simple thumbnails may be the best way to present the query results.

For the second issue, the core techniques, the authors first list all novel signature (feature) extraction and similarity (distance) measurement methods. I learn sometime new in this section, such as the Earth mover's distance (don't know why I used to skip Prof. Tomasi's paper at the first time :P) and some shape descriptors etc. One pity thing is that one new avenue, visual word, is less addressed in the paper, maybe it's because the paper is submitted before the emergence of this field. The visual words, quantized interesting point descriptors (e.g., SIFT), in an image can be considered as a bag of word document and used to perform text-alike query. It also enables spatial matching for object identification (like matching a sentence in a document), novel histogram matching methods (like measuring the number of the co-occurrence words in two documents), and new applications. Since the interesting point is the topic in the following course, I will try to arrange a semi-complete list about this topic in the future.

The author also discuss some methods about the clustering, the classification, and the relevance feedback. One particular thing I see through this topics is that the learning (or statistical) methods are becoming more important. The same thread happens in the computer vision fields, due to the invention of new theories and fast solvers. Building a graphical model for a specific problem is a very critical research now. Finally, the authors slightly mention the important of the manifold. It's interesting that manifold is more utilized in computer graphics (CG) than in computer vision or CBIR because many signals in CG , such as human motions and spatially- and temporal-variant BRDF's, are of high dimensions for which the direct distance measurement could be expensive and meaningless. Also the authors did not mention two important papers in 2000, the locally linear embedding and the Isomap, which inspire numerous researchers (maybe it's because they are not after 2000...).

The next topic is the offshoots from CBIR. However, I think most of them are not so meaningful. They can be challenging, but it is hard to say these fields can grow into a mature and independent ones in the future. In some sub-sections, most cited papers are from the authors. Maybe giving more details about a single topic can make me believe they are of real importance.

The final topic, evaluation strategies, is very important. The authors list many existing evaluation methods and datasets with ground truth. From these we can easily see that the number of the dataset is far from satisfaction. One main problem is that unlike the problems in computer vision, video compression, and computer graphics, it is hard to assign the ground truth for problems in CBIR. A open online interactive system could the best way to label the big dataset, but designing this could consume a huge time for small research teams.

This well-organized paper presents ample materials about CBIR. Many start-of-the-art techniques are mentioned here, and also many possible research directions. Finally, one small drawback is that the reference list is alphabetical. Many review papers order the references in topics to reach a better readability.

Saturday, February 23, 2008

[Reading] How to Give a Good Research Talk

The key point we should learn from this paper is 'If you bore your audience in the first few minutes, you may never get them back.' This happened to me for many times when I was either a speaker or an audience. Now I often try to put some results as a teaser in the beginning of the paper and the slide.

Another point is 'never hide your weakness.' This is not only true for presenting but also for writing a paper. I usually give a negative comment to the paper when I find the authors hide their drawbacks. However, where and how to present these drawbacks is an aesthetic problem. Put it at the end is not always the best solution.

This paper is a little dated so some points are not correct now. The visual aids now become far more important than it was at the time that the paper being published (1993). The computer slide authoring programs are very friendly and efficient now so people do not have to write the slide manually.

Finally, people tend to put their slide on the Internet now. I have organized a short researcher list at here.

[Reading] How to Read a Paper

The author presents his 'three-pass method' in this paper. To my surprise it is very similar to the method I'm using. I guess after reading enough papers, people tend to read in this way :P

Below I describe the each of his steps, and also each of them in my version.

1 [Keshav]. Read the title, the abstract, the introduction, the conclusion, and the reference list.
1 [Mine]. Read the title, abstract, the reference list, the prior work section, and then the introduction. I skip the conclusion because it has very useful function to me in the following steps. Also if there are slides/videos associated with the paper, I would check them first. These materials are very common in my field.

2 [Keshav]. Check the main body of the paper, the figures and the results. Mark the important cited papers to be survey and skip the proof.
2 [Mine]. Check the main body of the paper,

3 [Keshav]. Try to re-implement work presented in the paper. Identify its true contributions, its weakness, and the possible future work.
3 [Mine]. Try to re-implement work presented in the paper. Specifically, I'd try to convert the whole paper into a short pseudo-code and write in down. In this way I can totally ensure the method presented in this paper is feasible, and more or less understand the time complexity of the proposed method (The timing shown in the paper is not always reliable).

The little trick I have is 'using the conclusion section for relaxing myself.' When performing step 2 or 3, you may get tired for various reasons: bad notation, chaos description, etc. However, the content in the conclusion is somewhat predictable, so you can turn to that section to take a break.

The author also present a method to survey a new research topic. It's relatively trivial and therefore I skip it here.

Chia-Kai's AMMAI