2008年3月2日 星期日

[Paper Review] Distinctive Image Features from Scale-Invariant Keypoints

This paper introduces the celebrated SIFT feature to a great detail and received nearly 2000 citations on the Google Scholar. SIFT is successful in its general applicability to almost every kind of image distortion with a fine enough performance. Therefore, in many cases when people need some features to be matched, they choose SIFT.

The name "SIFT" indeed consists of two parts: the feature detector and the descriptor. The SIFT detector checks local extrema over the scale space in the DoG pyramid, while rejecting points with low contrasts or high edge responses. The remaining points' locations are then refined using parabola approximation. The SIFT descriptor then computes the principal gradient orientations over the nearby patches of the points to obtain a local rotation-invariant coordinate system. Finally, a grid-shape gradient orientation histogram is overlaid to describe each feature point as a 128-dim vector.

Although the concept is not much harder than other comparative studies, I have to say that a reliable implementation of SIFT is still a complicated work. I have written one copy of SIFT one year ago for my own needs, but just recently I can even find a bug in it. Here is a snapshot about the SIFT feature matching, you can see that SIFT is really robust to a variety of distortions:



SIFT is not perfect anyway. Theoretically, the DoG extrema is not invariant to affine transformations so we can't anticipate SIFT to work under large viewpoint changes. Nevertheless, it is shown in most cases SIFT continues to outperform other (nearly) affine-invariant feature detectors. I suspect the reasons are that uniformly scaled patches are more distinctive than stretched ones while sacrificing some robustness to affine transformation, and that the design of SIFT descriptor enables some mis-registration of features. This fact requires some further studies.

The SIFT descriptor is also challenged by some recent researches too. It is now known that GLOH(Gradient Location and Orientation Histogram) and steerable filter responses are better descriptors in cases of large database matching. Automatic learning of feature descriptors certainly would be the next main stream in feature matching.

1 則留言:

Valerio 提到...

I have been doing my Final Year Project which has an element of SIFT.