копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

Open Information Extraction from the Web

M. Banko, M. Cafarella, S. Soderland, M. Broadhead, и O. Etzioni. Proceedings of the 20th International Joint Conference on Artifical Intelligence, стр. 2670--2676. San Francisco, CA, USA, Morgan Kaufmann Publishers Inc., (2007)

Аннотация

Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

Линки и ресурсы

ключ BibTeX: banko2007
тип записи: inproceedings
адрес: San Francisco, CA, USA
название книги: Proceedings of the 20th International Joint Conference on Artifical Intelligence
год: 2007
страницы: 2670--2676
издательство: Morgan Kaufmann Publishers Inc.
серии: IJCAI'07
location: Hyderabad, India
acmid: 1625705
numpages: 7
Document: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9909B5C03DA1A3CCFFF4263898B69100?doi=10.1.1.74.5174&rep=rep1&type=pdf

тэги

@jil- тэги данного пользователя выделены

Цитировать эту публикацию

%0 Conference Paper %1 banko2007 %A Banko, Michele %A Cafarella, Michael J. %A Soderland, Stephen %A Broadhead, Matt %A Etzioni, Oren %B Proceedings of the 20th International Joint Conference on Artifical Intelligence %C San Francisco, CA, USA %D 2007 %I Morgan Kaufmann Publishers Inc. %K 2007 banko domain extraction ie information open relation textrunner unsupervised %P 2670--2676 %T Open Information Extraction from the Web %U http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9909B5C03DA1A3CCFFF4263898B69100?doi=10.1.1.74.5174&rep=rep1&type=pdf %X Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.

@inproceedings{banko2007, abstract = {Traditionally, Information Extraction (IE) has focused on satisfying precise, narrow, pre-specified requests from small homogeneous corpora (e.g., extract the location and time of seminars from a set of announcements). Shifting to a new domain requires the user to name the target relations and to manually create new extraction rules or hand-tag new training examples. This manual labor scales linearly with the number of target relations. This paper introduces Open IE (OIE), a new extraction paradigm where the system makes a single data-driven pass over its corpus and extracts a large set of relational tuples without requiring any human input. The paper also introduces TEXTRUNNER, a fully implemented, highly scalable OIE system where the tuples are assigned a probability and indexed to support efficient extraction and exploration via user queries. We report on experiments over a 9,000,000 Web page corpus that compare TEXTRUNNER with KNOWITALL, a state-of-the-art Web IE system. TEXTRUNNER achieves an error reduction of 33% on a comparable set of extractions. Furthermore, in the amount of time it takes KNOWITALL to perform extraction for a handful of pre-specified relations, TEXTRUNNER extracts a far broader set of facts reflecting orders of magnitude more relations, discovered on the fly. We report statistics on TEXTRUNNER's 11,000,000 highest probability tuples, and show that they contain over 1,000,000 concrete facts and over 6,500,000 more abstract assertions.}, acmid = {1625705}, added-at = {2014-01-05T16:36:30.000+0100}, address = {San Francisco, CA, USA}, author = {Banko, Michele and Cafarella, Michael J. and Soderland, Stephen and Broadhead, Matt and Etzioni, Oren}, biburl = {https://www.bibsonomy.org/bibtex/2950d3175227caaadff1fb51c5487c27f/jil}, booktitle = {Proceedings of the 20th International Joint Conference on Artifical Intelligence}, interhash = {b390d827609f4458b03aa13c5edc10c4}, intrahash = {950d3175227caaadff1fb51c5487c27f}, keywords = {2007 banko domain extraction ie information open relation textrunner unsupervised}, location = {Hyderabad, India}, numpages = {7}, pages = {2670--2676}, publisher = {Morgan Kaufmann Publishers Inc.}, series = {IJCAI'07}, timestamp = {2014-01-05T16:36:30.000+0100}, title = {Open Information Extraction from the Web}, url = {http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=9909B5C03DA1A3CCFFF4263898B69100?doi=10.1.1.74.5174&rep=rep1&type=pdf}, year = 2007 }

искать в

Метаданные

Последнее изменение 11 лет назад
Создан 11 лет назад

Комментарии и рецензии
(0)

Комментарии, или рецензии отсутствуют. Вы можете их написать!