2008年3月10日 星期一

[Paper Review] Nonlinear Dimensionality Reduction by Locally Linear Embedding

As the article is published on popular science readings, obviously we can't anticipate to find implementation details in it. Unfortunately, I didn't find enough time to read the extended readings on the list, so I would briefly talk about my opinions on this algorithm.

The concept of LLE is to capture the global underlying relationship(aka, manifold) of the data by inspecting the local neighborhood of each observation. The algorithm first tries to explain each point by a weighting sum of its neighbors, and secondly finds a low dimensional coordinate system that best preserves this weighting relationship. At the first glance, this might seem to be strange: how can one get a global sense by only looking at local neighbors? The fact is that in most cases the distances between points too far away are not very relevant to internal mechanism of data generation. We just have to know they are distant from each other. This is well illustrated by the swiss-roll example in the paper. Linear methods like PCA would incorrectly project the data to a plane because they consider the distance between every two points "equally important".

This idea may not always work. If the distribution of data points is not well-behaved(e.g. densely clustered) or every distance measurements are important, LLE would fail. I have read one work in texture synthesis just falling into this category. The thought here is to perform dimension reduction on the space of texture patches while preserving their appearance difference, so that one can boost the performance by considering only the low dimensional counterpart in synthesis. Because patches of one type of texture look quite similar, LLE wrongly mixes up the data and traditional methods like PCA win by maintaining the total scatter. The applicability of manifold methods certainly depends on the data type.

沒有留言: