Online NMF
Encyclopedia
Online NMF is a recently developed method for real time data analysis in an online context. Non-negative matrix factorization in the past has been used for static data analysis and pattern recognition. In the past it has been used for facial recognition and spectral data analysis, however due to the time and memory expensive nature of NMF algorithms2, PCA
, SVD
, and Pearson correlation
based methods have been used instead. However, the fact that data can be recreated as a linear combination of the set of resolved "basis" data is advantageous in some lines of study. One such use is for collaborative filtering
in recommendation systems where it is advantageous to know not only how much two individuals are alike, which can be derived from the Pearson correlation, but also in what ways are they alike. The purpose of the online NMF algorithm is to perform rapid NMF analysis so that recommendations can be produced in real time.
V = W*H + E
For simplicity we claim V ~ W*H
We then add the additional data U to matrix V resulting in V'
V' = ( V ) over ( U )
V' = W'*H
V' = ( W1' )*H' over ( W2' )
Also, for previously processed data:
W' = ( W*Λ−1*W1' ) over ( W2' )
H' = W1'−1*Λ*H
Where Λ is a diagonal matrix were Λii equals weight factor hi
For unprocessed data we need to perform the optimization problem of minimizing J. We have:
J = 1/2 ||V - W*H|| + a*R*H*HT
Where a is a positive integer and R is a symmetry non-negative matrix. This can be done using the following iterative algorithm
wij <-- wij (V*HT)ij over (W*H*HT)ij
hij <-- hij (WT*V)ij over (WT*W*H + a*R*H)ij
2) timestamp t:
a) calculate W' and H' using the NMF algorithm for new data
(for this step you do not need to use all of the data in previous W and H)
b) create W and H using W' and H' from the previous step and the NMF algorithm for previously processed data.
3) timestamp T: Output final W and H
. Also, W has a height of the number of people in the network we are looking at and H has a length of the number of items. After running an NMF algorithm, such as gradient descent
or alternating least squares, you will have resolved W and H. W and H serve as an invaluable description of your network as shown in V. H represents each "type" of user, and each cell represents what someone of that type would rate the selected item (column). In the high-schooler example, the first row would be jock, the second, emo, etc. For the items football and silently crying to yourself (both of which would be in the rows), the jock would rank football high and crying low, while the emo would do the reverse.
The W matrix represents the weights, it is what forms the final result V. Remember, each row in V represents an actual person with lots of complexity. For example, the first person in list V might equally like crying and football, in which case he would be half jock and half emo, in which case the first row of the weights matrix would be 1/2 1/2. However, sometimes the person can not be perfectly represented as a linear combination of the types in the basis matrix H, this must be made up for in the error matrix E.
By performing NMF on a bunch of people and their likes and dislikes we have essentially created a recommendation system
. Based on what someone liked in the past we can conjecture what collection of personality traits someone has and from this form recommendations based on these traits. Separating these can be highly beneficial. For example if a Youtube user likes videos of kittens, but also likes WWF videos, it is unlikely that anyone else in the network likes those two things so a Pearson Correlation with everyone else will always be bad. However matrix factorization understands this individuals dual personalities because they both exist in the basis (kitten lover and wrestling fan), so we can make recommendations on what the basis people (not real) like, and we end up properly recommending kittens and wrestling movies.
One of the reasons this is not regularly deployed is that it is hard to continually perform NMF algorithms on constantly changing huge data sets. This is the problem that Online NMF hopes to solve. It uses the previous results from timestamp t and then adds in the additional users and items as they come in timestamp t+1 without having to perform the entire algorithm over.
Principal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
, SVD
Singular value decomposition
In linear algebra, the singular value decomposition is a factorization of a real or complex matrix, with many useful applications in signal processing and statistics....
, and Pearson correlation
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
based methods have been used instead. However, the fact that data can be recreated as a linear combination of the set of resolved "basis" data is advantageous in some lines of study. One such use is for collaborative filtering
Collaborative filtering
Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...
in recommendation systems where it is advantageous to know not only how much two individuals are alike, which can be derived from the Pearson correlation, but also in what ways are they alike. The purpose of the online NMF algorithm is to perform rapid NMF analysis so that recommendations can be produced in real time.
Mathematical Framework
We begin with the initial factorization at timestamp tV = W*H + E
For simplicity we claim V ~ W*H
We then add the additional data U to matrix V resulting in V'
V' = ( V ) over ( U )
V' = W'*H
V' = ( W1' )*H' over ( W2' )
Also, for previously processed data:
W' = ( W*Λ−1*W1' ) over ( W2' )
H' = W1'−1*Λ*H
Where Λ is a diagonal matrix were Λii equals weight factor hi
For unprocessed data we need to perform the optimization problem of minimizing J. We have:
J = 1/2 ||V - W*H|| + a*R*H*HT
Where a is a positive integer and R is a symmetry non-negative matrix. This can be done using the following iterative algorithm
wij <-- wij (V*HT)ij over (W*H*HT)ij
hij <-- hij (WT*V)ij over (WT*W*H + a*R*H)ij
Online NMF Algorithm
1) timestamp 0: Initialize using the NMF algorithm for new data2) timestamp t:
a) calculate W' and H' using the NMF algorithm for new data
(for this step you do not need to use all of the data in previous W and H)
b) create W and H using W' and H' from the previous step and the NMF algorithm for previously processed data.
3) timestamp T: Output final W and H
Recommendation
Let V be a matrix of data were the rows represent internet users, the columns represent items, and each cell is what that internet viewer rated that item. First we decide how many "types" of user we expect there to be in our network, such as types of star wars characters: Jedi, rebel soldier, imperial soldier, ewok, or types of high-schooler: jock, emo, nerd, goth, etc. This number is the length of W (the weights matrix) and the height of H (the basis matrix). We can determine this number using PCAPrincipal components analysis
Principal component analysis is a mathematical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of uncorrelated variables called principal components. The number of principal components is less than or equal to...
. Also, W has a height of the number of people in the network we are looking at and H has a length of the number of items. After running an NMF algorithm, such as gradient descent
Gradient descent
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point...
or alternating least squares, you will have resolved W and H. W and H serve as an invaluable description of your network as shown in V. H represents each "type" of user, and each cell represents what someone of that type would rate the selected item (column). In the high-schooler example, the first row would be jock, the second, emo, etc. For the items football and silently crying to yourself (both of which would be in the rows), the jock would rank football high and crying low, while the emo would do the reverse.
The W matrix represents the weights, it is what forms the final result V. Remember, each row in V represents an actual person with lots of complexity. For example, the first person in list V might equally like crying and football, in which case he would be half jock and half emo, in which case the first row of the weights matrix would be 1/2 1/2. However, sometimes the person can not be perfectly represented as a linear combination of the types in the basis matrix H, this must be made up for in the error matrix E.
By performing NMF on a bunch of people and their likes and dislikes we have essentially created a recommendation system
Recommendation system
Recommender systems, recommendation systems, recommendation engines, recommendation frameworks, recommendation platforms or simply recommender form or work from a specific type of information filtering system technique that attempts to recommend information items Recommender systems, recommendation...
. Based on what someone liked in the past we can conjecture what collection of personality traits someone has and from this form recommendations based on these traits. Separating these can be highly beneficial. For example if a Youtube user likes videos of kittens, but also likes WWF videos, it is unlikely that anyone else in the network likes those two things so a Pearson Correlation with everyone else will always be bad. However matrix factorization understands this individuals dual personalities because they both exist in the basis (kitten lover and wrestling fan), so we can make recommendations on what the basis people (not real) like, and we end up properly recommending kittens and wrestling movies.
One of the reasons this is not regularly deployed is that it is hard to continually perform NMF algorithms on constantly changing huge data sets. This is the problem that Online NMF hopes to solve. It uses the previous results from timestamp t and then adds in the additional users and items as they come in timestamp t+1 without having to perform the entire algorithm over.
Search Results
Hand and hand with recommendation systems comes the back-end of search engines. Using this method it is possible to get better search results by guessing what the person is looking for based on the type of person they are.General Signal Analysis
This type of analysis can also be applied to general signals that update themselves regularly.Links and References
- 1) http://www.siam.org/proceedings/datamining/2006/dm06_059zhangs2.pdf
- 2) http://www.ijcai.org/papers07/Papers/IJCAI07-432.pdf
- 3) http://portal.acm.org/citation.cfm?id=1339264.1339709