Abstract. In order to support web applications to understand the content of HTML pages an increasing number of websites have started to annotate structured data within their pages using markup formats such as Microdata, RDFa, Microformats. The annotations are used by Google, Yahoo!, Yandex, Bing and Facebook to enrich search results and to display entity descriptions within their applications. In this paper, we present a series of publicly accessible Microdata, RDFa, Microformats datasets that we have extracted from three large web corpora dating from 2010, 2012 and 2013.
More and more websites have started to embed structured data describing products, people, organizations, places, and events into their HTML pages using markup standards such as Microdata, JSON-LD, RDFa, and Microformats. The Web Data Commons project extracts this data from several billion web pages. So far the project provides 11 different data set releases extracted from the Common Crawls 2010 to 2022. The project provides the extracted data for download and publishes statistics about the deployment of the different formats.
One of the most useful features of the Dataverse repository software is the large number of metadata fields it provides for describing research data. This guide is intended to support both the novice and experienced user in creating metadata for datasets in a Dataverse repository. It provides official definitions of metadata fields with clarifications and tips, distinguishes between required, recommended, and optional fields, and illustrates the use of fields with examples. This version of the guide has been updated to include coverage of all available metadata fields - citation, geospatial, social science and humanities, astronomy and astrophysics, life sciences, and journal metadata. The guide was created with permission from Harvard for the use of definitions and the Texas Digital Library for basic design. Ce guide est aussi disponible en français.
The Science Data Management Branch (SDM) of the U.S. Geological Survey (USGS) provides data management expertise and leadership and develops guidance and tools to support the USGS in providing the nation with reliable scientific information on the basis of which to describe the Earth. The SDM suite of tools supports the USGS Data Management Lifecycle by facilitating quality assurance, description, curation, and publishing of the Bureau's scientific data. The SDM suite of tools includes the USGS Data Management Website, USGS Science Data Catalog, Digital Object Identifier Tool, ScienceBase, ScienceBase Data Release Tool, Metadata Wizard, and Online Metadata Editor....
This is a really good example of how a repository uses EML data. Morpho was probably used to create an xml file. This file was used by the KNB repository to create a very detailed picture of the dataset
DSPL is the Dataset Publishing Language, a representation language for the data and metadata of datasets. Datasets described in this format can be processed by Google and visualized in the Google Public Data Explorer.
R. Amorim, J. Castro, J. da Silva, and C. Ribeiro. New Contributions in Information Systems and Technologies, page 101--111. Springer International Publishing, (2015)
S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, page 9--16. New York, NY, USA, ACM, (2012)
S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, page 9--16. New York, NY, USA, ACM, (September 2012)
S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, page 9--16. New York, NY, USA, ACM, (2012)
S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, page 9--16. New York, NY, USA, ACM, (2012)
S. Doerfel, R. Jäschke, A. Hotho, and G. Stumme. Proceedings of the 4th ACM RecSys workshop on Recommender systems and the social web, page 9--16. New York, NY, USA, ACM, (2012)