Sunday, March 30, 2008

[Reading] Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

This paper presents a new concept to object recognition: model it as a machine translation problem.

Given a training dataset, an EM-alike algorithm is used to learn the correspondence between the "words" between the image (an English document) and the annotation (an French document). The paper assume the training data is perfect: there should be no noise within. However, some thresholding does improve the performance.

In EM, Lagrangian is used to make sure the variables are sum-to-one. However, maybe a simple re-projection is enough. And in fact, re-projection is indeed used in the algorithm (Fig. 4)....can't see the Lagrangian multipliers in the code @@.

In summary, the idea is very nice, but this is a paper more like introducing a concept instead of building a practical system. There are many interesting things to do.

No comments: