p(w|d) = \sum_z p(w|z) p(z|d),

where p(w|d) is a |W|-D vector, p(w|z) is a |W|-d vector (Ai), and p(z|d) is a real value. We can put all right-sided vectors of different z's into two matrices, one is |W|x|Z| (H) and the other one is |Z|x1 (Vi). If we consider p(w|d) of different d's are columns in a |W|x|D| matrix (A), then we haveA = HV,

where the entities of H and V are all non-negative. Since A is given and what we want is H and V, this is in fact a NMF, plus a sum to 1 constraint so that each column is a valid pdf. However, the main difference is that in the pLSA paper, the factorization process is examined in the probabilistic framework and achieved by EM. In NMF papers, it is usually casted as an optimization problem.

I didn't see the original pLSA paper before but read the LDA directly. I believe it is necessary and beneficial to read pLSA first since it gives many discussions about how and why the model is designed. Also it gives many good application examples.

## No comments:

Post a Comment