Our main goal is to provide you with data because you know what you want to do with it. Still, we give some information regarding typical MIR tasks below. We hope to provide snippets of code and benchmarks results to help you getting started. If you want to provide additional information / link to your code / new results / new tasks, please send us an email! We also try to maintain an informal list of publications that use the dataset.
The Open Geospatial Consortium, Inc.® (OGC) is a non-profit, international, voluntary consensus standards organization that is leading the development of standards for geospatial and location based services.
The California Digital Library (CDL), Portico, and Stanford University have received funding from the Library of Congress, under its National Digital Information Infrastructure Preservation Program (NDIIPP) initiative, to collaborate on a two-year project to develop a next-generation JHOVE2 architecture for format-aware characterization.
MetaTab ist ein Programm zum Erstellen und Bearbeiten von Meta-Tags. Die erstellten Meta-Tags können per Mausklick in die HTML-Seite eingefügt werden. - unterstützt Dublin Core - plus selbstdefinierte Felder
a joint project of several national libraries, implemented and hosted by OCLC. The project's goal is to lower the cost and increase the utility of library authority files by matching and linking the authority files of national libraries, and then making that information available on the Web.
ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.
Apple's metadata filesystem - stores file metadata in seperate database, import for common and customized file metadata, sophisticated api and command line query
Large quantities of historical newspapers are being digitized and OCRd. We describe a framework for processing the OCRd text to identify articles and extract metadata for them. We describe the article schema and provide examples of features that facilitate automatic indexing of them. For this processing, we employ lexical semantics, structural models, and community content. Furthermore, we describe visualization and summarization techniques that can be used to present the extracted events.
We follow the MODS schema and therefore a web-based MODS editor was built to create MODS XML records. These records are deposited into the eXist native XML database where they can be accessed via a REST API.
This project will demonstrate middleware that enables easier deposit of research papers through batch upload of extant bibliographic metadata. It can be employed to assist deposit into the Depot as well as offer facility for repositories more generally, with potential to enhance metadata deposit through transfers and re-directs to institutional repositories (IRs). Using a web service approach and m2m interfaces such as Deposit API / SWORD, this middleware facility will show proof of concept at an early stage by connecting two existing services: the Depot, a UK repository for researchers who do not have other provision, and PublicationsList.org, a web site for researchers to build a web page listing their publications. The latter has existing functionality for batch import of bibliographic metadata for a (personal) publications list - from a variety of online sources such as PubMed, Web of Science, and for the same for personal databases, such as EndNote, Reference Manager, BibTeX etc.
Very interesting approach!
"Apache Empire-db is an Open Source relational data persistence component which allows database vendor independent dynamic query definition as well as safe and simple data retrieval and updating. Compared to most other solutions like e.g. Hibernate, TopLink, iBATIS or JPA implementations, Empire-db takes a considerably different approach, with a special focus on compile-time safety, reduced redundancies and improved developer productivity."
PEER (Publishing and the Ecology of European Research), supported by the EC
eContentplus programme, will investigate the effects of the large-scale, systematic depositing
of authors’ peer-reviewed manuscripts (so called Green Open Access or stage-21
research output) on reader access, author visibility, and journal viability, as well as on the
broader ecology of European research.
BMF schafft die Voraussetzung, um Metadaten der Fernsehproduktion in einheitlicher und konsistenter Form auszutauschen. Es bildet außerdem die Basis, um Kosten und Risiken bei der Integration von Schnittstellen zu reduzieren.
* Google has access to WorldCat metadata
* Google says bad metadata comes from external providers
* No restrictions on which WorldCat metadata fields can be used
textMD is a XML Schema maintained by the Library of Congress that details technical metadata for text-based digital objects. It allows for detailing properties such as encoding information (quality, platform, software, agent), character information (character set and size, byte order and size, line terminators), languages, fonts, markup information, processing and textual notes, technical requirements for printing and viewing, and page ordering and sequencing.
SIMILE is focused on developing robust, open source tools that empower users to access, manage, visualize and reuse digital assets. Learn more about the SIMILE project.
The Calais web service automatically attaches rich semantic metadata to the content you submit. Using natural language processing, machine learning and other methods, Calais categorizes and links your document with entities (people, places, organizations,
Visual Web Spider is a software for collecting relevant Web sites on the Internet. Visual Web Spider is a Web site crawler, fully automated, multithreaded Web robot.
Semantic Interoperability of Metadata and Information in unLike Environments.
SIMILE is focused on developing robust, open source tools that empower users to access, manage, visualize and reuse digital assets. Learn more about the SIMILE project.
Wasabi is an effort to create a unified searching and metadata storage specification for the free desktop. This document contains draft specs meant for further discussion.