This page provides two large hyperlink graph for public download. The graphs have been extracted from the 2012 and 2014 versions of the Common Crawl web corpera. The 2012 graph covers 3.5 billion web pages and 128 billion hyperlinks between these pages. To the best of our knowledge, the graph is the largest hyperlink graph that is available to the public outside companies such as Google, Yahoo, and Microsoft. The2014 graph covers 1.7 billion web pages connected by 64 billion hyperlinks. Below we provide instructions on how to download the graphs as well as basic statistics about their topology.
Using Trideux factorial correspondence analysis and Calliope co-occurrence of key word analysis, we apply them to the data base of key words characterizing each research article or ongoing research report published by the Bulletin of Sociological Methodology (BSM) from December 1993 to October 2003. We present the results of these analyses, followed by the complete list of tables of contents, the author index and the article-title index for the articles and reports analyzed.
R. Nallapati, A. Ahmed, E. Xing, and W. Cohen. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, page 542--550. New York, NY, USA, ACM, (2008)
J. Mackinlay, R. Rao, and S. Card. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, page 67--73. New York, NY, USA, ACM Press/Addison-Wesley Publishing Co., (1995)
Y. Chung, M. Toyoda, and M. Kitsuregawa. Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, page 9--16. New York, NY, USA, ACM, (2009)