Thursday, March 27, 2008

[Reading] Algorithms for Fast Vector Quantization

This paper presents three different variant of the k-d trees to improved the search performance. The first one introduces a technique called incremental distance to reduce the calculation. In traversing the tree, the distance between the query point and the childs can be easily updated from the distance the query point and the current node (O(1) instead of O(D), where D is the dimension). I found this is very useful. It does not degrade the performance but greatly reduces the computation.

The second method uses a priority queue to store the distances to the all traversed nodes. After a leaf is checked, the algorithm always picks the node with smallest distance from the queue to traverse. The interesting thing is that, this idea is used in the SIFT system but the author did not method this reference.

The third method, although gives good performance, it requires a O(N^2) preprocessing to build the local neighborhood graph. Therefore it may not be suitable for large systems.

Both the first two methods are very exciting, and I'm considering to port the first one into my program. Also in the experiment section, a well-known early termination technique is used. In calculating the distance, we may quit at the i-th dimension if the accumulated distance is larger than the currently optimal one. However, early termination may not be the best choice in the current CPU pipeline since there is a branch is inside the for loop. If we want to optimize it, manual unfolding may be necessary. For example, calculate the first K dimensions at once, do the check, and then move to the next K dimensions.

No comments: