Monday, February 25, 2008

[Reading] Image Retrieval: Ideas, Influences, and Trends of the New Age

This is a very long paper (66 pages, and 12 of them are references). The authors try to mention all new topics in the CBIR field after the millennium. This restriction, however, sometimes hinders the readability since all the previous work before 2000 are ignored and the readers have to check the older overview papers.

The paper contains 4 main topics: the real-world demand for CBIR, the core techniques, the offshoots, and the evaluation-related issues. For the first topic, the authors use 2 3D cubes to describe the possible behaviors of the user and of the system. Although the possible space is large, I personally think some cases should be ruled out in research. For example, designing a efficient system for a user with the intent "browser" is meaningless. All considerations about the system mentioned in the paper are very critical. Each design choice can largely affect the performance and cost. It's interesting that the authors do agree that simple thumbnails may be the best way to present the query results.

For the second issue, the core techniques, the authors first list all novel signature (feature) extraction and similarity (distance) measurement methods. I learn sometime new in this section, such as the Earth mover's distance (don't know why I used to skip Prof. Tomasi's paper at the first time :P) and some shape descriptors etc. One pity thing is that one new avenue, visual word, is less addressed in the paper, maybe it's because the paper is submitted before the emergence of this field. The visual words, quantized interesting point descriptors (e.g., SIFT), in an image can be considered as a bag of word document and used to perform text-alike query. It also enables spatial matching for object identification (like matching a sentence in a document), novel histogram matching methods (like measuring the number of the co-occurrence words in two documents), and new applications. Since the interesting point is the topic in the following course, I will try to arrange a semi-complete list about this topic in the future.

The author also discuss some methods about the clustering, the classification, and the relevance feedback. One particular thing I see through this topics is that the learning (or statistical) methods are becoming more important. The same thread happens in the computer vision fields, due to the invention of new theories and fast solvers. Building a graphical model for a specific problem is a very critical research now. Finally, the authors slightly mention the important of the manifold. It's interesting that manifold is more utilized in computer graphics (CG) than in computer vision or CBIR because many signals in CG , such as human motions and spatially- and temporal-variant BRDF's, are of high dimensions for which the direct distance measurement could be expensive and meaningless. Also the authors did not mention two important papers in 2000, the locally linear embedding and the Isomap, which inspire numerous researchers (maybe it's because they are not after 2000...).

The next topic is the offshoots from CBIR. However, I think most of them are not so meaningful. They can be challenging, but it is hard to say these fields can grow into a mature and independent ones in the future. In some sub-sections, most cited papers are from the authors. Maybe giving more details about a single topic can make me believe they are of real importance.

The final topic, evaluation strategies, is very important. The authors list many existing evaluation methods and datasets with ground truth. From these we can easily see that the number of the dataset is far from satisfaction. One main problem is that unlike the problems in computer vision, video compression, and computer graphics, it is hard to assign the ground truth for problems in CBIR. A open online interactive system could the best way to label the big dataset, but designing this could consume a huge time for small research teams.

This well-organized paper presents ample materials about CBIR. Many start-of-the-art techniques are mentioned here, and also many possible research directions. Finally, one small drawback is that the reference list is alphabetical. Many review papers order the references in topics to reach a better readability.

No comments: