The PageRank algorithm is a great way of using collective intelligence to determine the importance of a webpage. There’s a big problem, though, which is that PageRank is difficult to apply to the web as a whole, simply because the web contains so many webpages. While just a few lines of code can be used to implement PageRank on collections of a few thousand webpages, it’s trickier to compute PageRank for larger sets of pages. The underlying problem is that the most direct way to compute the PageRank of n webpages involves inverting an n \times n matrix. Even when n is just a few thousand, this means inverting a matrix containing millions or tens of millions of floating point numbers. This is possible on a typical personal computer, but it’s hard to go much further. In this post, I describe how to compute PageRank for collections containing millions of webpages. My little laptop easily coped with two million pages, using about 650 megabytes of RAM and a few hours of computation
The Matlab Toolbox for Dimensionality Reduction contains Matlab implementations of a large number of techniques for dimensionality reduction. A large number of implementations was developed from scratch, whereas other implementations are improved versions
You've built a vibrant community of Family Guy enthusiasts. The SVD recommendation algorithm took your site to the next level by allowing you to leverage the implicit knowledge of your community. But now you're ready for the next iteration - you are about
G. Hamerly, and C. Elkan. CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management, page 600--607. New York, NY, USA, ACM, (2002)
M. Banko, and E. Brill. ACL '01: Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, page 26--33. Morristown, NJ, USA, Association for Computational Linguistics, (2001)