Wednesday, August 08, 2007
Using Wikipedia to disambiguate names
Silviu Cucerzan at Microsoft Research recently published a paper, "Large-Scale Named Entity Disambiguation Based on Wikipedia Data"
Libraries do something they call "name authority control". For most people in IT, this would be called "assigning unique identifiers to names." Identifying authors is considered one of the essential aspects of library cataloging, and it isn't done in any other bibliographic environment, as far as I know.
VIAF: The Virtual International Authority File
VIAF is a joint project of the Library of Congress (LC), the Deutsche Nationalbibliothek (DNB), the Bibliothèque nationale de France (BnF), and OCLC. The project's goal is to match and link the library authority files.
The search box at the top of this page searches a prototype of VIAF, which is derived from the personal name authority and related bibliographic data of LC, DNB, and BnF. Matches are shown as links between records (in MARC-21 998 fields).
More information can be found at the OCLC Research VIAF Project page.
IV. Division of Bibliographic Control
Working Group on FRANAR
Working Group on Functional Requirements and Numbering of Authority Records (FRANAR)
Scope
The Working Group on Functional Requirements and Numbering of Authority Records (FRANAR) was established in April 1999 by the IFLA Division of Bibliographic Control and the IFLA Universal Bibliographic Control and International MARC Programme (UBCIM). Following the end of the UBCIM Programme in 2003, the IFLA-CDNL Alliance for Bibliographic Standards (ICABS) took over joint responsibility for the FRANAR Working Group with the British Library as the responsible body.
The Working Group is charged by the IFLA Division IV:
* To define functional requirements of authority records
* To study the feasibility of an International Standard Authority Data Number
* To serve as the official IFLA liaison to and work with other interested groups concerning authority files.
Author Authority Files for 2006-12-01 (February 22, 2007)
This is a collection of XML files describing the authoritive aliases for author names (and perhaps book subjects?)
The International Standard Name Identifier (ISNI) is a draft ISO Standard (ISO 27729) whose scope is the identification of Public Identities of parties: that is, the identities used publicly by parties involved throughout the media content industries in the creation, production, management, and content distribution chains.
The ISNI system uniquely identifies Public Identities across multiple fields of creative activity. The ISNI provides a tool for disambiguating Public Identities that might otherwise be confused.
ISNI is not intended to provide direct access to comprehensive information about a Public Identity but can provide links to other systems where such information is held.
What does it look like ?
An ISNI is made up of 16 decimal digits, the last one being a check character.
Example:
ISNI 1422 4586 3573 0476
Who can apply for an ISNI ?
An ISNI can be allocated to any entity that is or was either a natural person, a legal person, a fictional character, or a group of such entities, whether or not incorporated.
Further, ISNIs are assigned to the Public Identities of Parties that participe in the creation, production, management or distribution of cultural goods in the digital environment.
The Identification of Authors in the Mathematical Reviews Database
Bert TePaske-King
Manager, Bibliographic Services Department
Mathematical Reviews
Norman Richert
Administrative Editor
Mathematical Reviews
I. Technical Impasse: Despite useful software, NameBase is labor-intensive.
Professional librarians call it the "name authority problem." Assume that you could do perfect OCR scans of book indexes. After a few thousand names are collected from a couple dozen books, you might end up with JOHN SMITH, JACK SMITH, JOHN A. SMITH, JOHN ARTHUR SMITH, J.A. SMITH, JOHN SMITH, JR., J. ARTHUR SMITH. Which of these refer to the same person, and which are namesakes? To answer this question, all the names must be considered in context. Sometimes, various editions of Who's Who are needed for further research, or telephone listings on CD-ROM might be used to determine correct spelling.
Y. Tan, M. Kan, und D. Lee. JCDL '06: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, Seite 314--315. New York, NY, USA, ACM, (2006)