Abstract. In order to support web applications to understand the content of HTML pages an increasing number of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, Microformats. The annotations are used by Google, Yahoo!, Yandex, Bing and Facebook to enrich search results and to display entity descriptions within their applications. In this paper, we present a series of publicly accessible Microdata, RDFa, Microformats datasets that we have extracted from three large web corpora dating from 2010, 2012 and 2013.
More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as Microdata, JSON-LD, RDFa, and Microformats. The Web Data Commons project extracts this data from several billion web pages. So far the project provides 11 different data set releases extracted from the Common Crawls 2010 to 2022. The project provides the extracted data for download and publishes statistics about the deployment of the different formats.
A. Harth. Web Semantics: Science, Services and Agents on the World Wide Web, 8 (4):
348--354(2010)Semantic Web Challenge 2009 User Interaction in Semantic Web research.
S. Staab, J. Lehmann, и R. Verborgh. Companion Proceedings of the The Web Conference 2018, стр. 885--886. Republic and Canton of Geneva, Switzerland, International World Wide Web Conferences Steering Committee, (2018)
H. Zhang, A. Santos, и J. Freire. Proceedings of the 30th ACM International Conference on Information &$\mathsemicolon$ Knowledge Management, ACM, (октября 2021)
M. Paris, и R. Jäschke. Proceedings of the 14th International Conference on Knowledge Science, Engineering and Management, том 12816 из Lecture Notes in Artificial Intelligence, стр. 1--14. Springer, (2021)
A. Harth. Web Semantics: Science, Services and Agents on the World Wide Web, 8 (4):
348--354(2010)Semantic Web Challenge 2009 User Interaction in Semantic Web research.
R. Yu, B. Fetahu, U. Gadiraju, и S. Dietze. Proceedings of the ISWC 2016 Posters & Demonstrations Track co-located with 15th International Semantic Web Conference (ISWC 2016), Kobe, Japan, October 19, 2016., (2016)
R. Yu, U. Gadiraju, X. Zhu, B. Fetahu, и S. Dietze. The Semantic Web - ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29 - June 2, 2016, Revised Selected Papers, стр. 69--73. (2016)
B. Berendt, A. Hotho, и G. Stumme. Web Semantics: Science, Services and Agents on the World Wide Web, 8 (2-3):
95 - 96(2010)Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences.
R. Jäschke, и S. Rudolph. Contributions to the 11th International Conference on Formal Concept Analysis, стр. 19--34. Technische Universität Dresden, (мая 2013)
B. Berendt, A. Hotho, и G. Stumme. Web Semantics: Science, Services and Agents on the World Wide Web, 8 (2-3):
95 - 96(2010)Bridging the Gap--Data Mining and Social Network Analysis for Integrating Semantic Web and Web 2.0; The Future of Knowledge Dissemination: The Elsevier Grand Challenge for the Life Sciences.
A. Gondhali, R. Chandra, A. Shinde, и S. Pimple.. International Journal on Recent and Innovation Trends in Computing and Communication, 3 (4):
1841--1844(апреля 2015)
R. Jäschke, и S. Rudolph. Contributions to the 11th International Conference on Formal Concept Analysis, стр. 19--34. Technische Universität Dresden, (мая 2013)
U. Gadiraju, R. Kawase, и S. Dietze. Proceedings of the ISWC 2014 Posters & Demonstrations Track a track within the 13th International Semantic Web Conference, ISWC 2014, Riva del Garda, Italy, October 21, 2014., стр. 461--464. (2014)
R. Jäschke, и S. Rudolph. Contributions to the 11th International Conference on Formal Concept Analysis, стр. 19--34. Technische Universität Dresden, (мая 2013)
R. Jäschke, и S. Rudolph. Contributions to the 11th International Conference on Formal Concept Analysis, стр. 19--34. Technische Universität Dresden, (мая 2013)
S. Moosavi, {. Seyyedi, и N. Moghadam. Information Technology: New Generations, 2009. ITNG '09. Sixth International Conference on, стр. 290--295. (апреля 2009)
R. Farrell, S. Liburd, и J. Thomas. Proceedings of the 13th international World Wide Web conference on
Alternate track papers & posters, стр. 162--169. New York, NY, USA, ACM, (2004)
F. Dau. Proceedings of the 19th International Conference on Conceptual Structures (ICCS 2011), том 6828 из Lecture Notes in Computer Science, стр. 1-18. Springer, (2011)