2008年3月24日 星期一

[Paper Review] Algorithms and Applications for Approximate Nonnegative Matrix Factorization

This paper is a general survey of recent advances in the NMF area. Given a non-negative matrix A, NMF is to find a pair of also non-negative matrix W and H such that the Frobenius norm of A-WH is minimized. The non-negativity has the advantage that it properly reflects some physical limitations that only non-substractive combinations of features are acceptable(for ex. topics of textual data). In this way, NMF differs from the classical SVD decomposition where no such non-negative restriction is imposed.

 

The benefits of non-negativity does not come without price. In contrary to SVD, NMF is nonconvex in both W and H and requires numerical methods that deals with non-linear optimization. According to the paper, this seems to be the core problem that researchers work on in this area. The original multiplicative update algorithm, although simple, has the problem of converging to saddle points and is very slow. General methods such like gradient descents apply to NMF, but require a good mechanism to determine stepsize. The alternating least square approach seems to be the fast method for now and enjoys the flexibility of "escaping" from zero entries due to the projection step.  Nevertheless, it may still converge to a saddle point. Thus, up to the present, it does not exist an excellent algorithm for NMF in general. Users intend for local minima should initialize the optimization with different start points and check the results for optimality.

 

In general, fundamental mathematic formulations often do not suit a specific problem, and prior knowledges of the problem should be brought in for a good solution to be got. This is even more critical in NMF because the optimal solution is not unique here. The paper addressed several common extensions to the cost function of NMF and validate the value of the most basic one(regularization on magnitudes of H or W) through experiments.  In my personal experience, the regularization(smoothness) terms are usually important factors that may affect the solution quality and require the special care.

 

The two experiments in the paper are the text mining and the spectral data analysis. The results seem quite good, but the author did not put on the comparison with other methods, so I can't judge the power of the NMF approach properly. Indeed, the result on text mining is visually similar to the one in the pLSA paper. It is shown in [1] that NMF with Frobenius norm replaced by KL divergence is essentially the same as pLSA and both kinds of problems are instances of the more general multinomial PCA problem. The connection between those methods and the performance comparison on different problems should be a good research topic.

沒有留言: