A. Jain, and P. Pantel. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010, page 501--509. (2010)
Abstract
Fact collections are mostly built using semi-supervised relation extraction techniques and wisdom of the crowds methods, rendering them inherently noisy. In this paper, we propose to validate the resulting facts by leveraging global constraints inherent in large fact collections, observing that correct facts will tend to match their arguments with other facts more often than with incorrect ones. We model this intuition as a graph-ranking problem over a fact graph and explore novel random walk algorithms. We present an empirical study, over a large set of facts extracted from a 500 million document webcrawl, validating the model and showing that it improves fact quality over state-of-the-art methods. 1
Description
CiteSeerX — Factrank: Random walks on a web of facts
%0 Conference Paper
%1 Jain10factrank:random
%A Jain, Alpa
%A Pantel, Patrick
%B In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010
%D 2010
%K extraction fact filtering pagerank ranking relation web
%P 501--509
%T Factrank: Random walks on a web of facts
%U http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.230.8852
%X Fact collections are mostly built using semi-supervised relation extraction techniques and wisdom of the crowds methods, rendering them inherently noisy. In this paper, we propose to validate the resulting facts by leveraging global constraints inherent in large fact collections, observing that correct facts will tend to match their arguments with other facts more often than with incorrect ones. We model this intuition as a graph-ranking problem over a fact graph and explore novel random walk algorithms. We present an empirical study, over a large set of facts extracted from a 500 million document webcrawl, validating the model and showing that it improves fact quality over state-of-the-art methods. 1
@inproceedings{Jain10factrank:random,
abstract = {Fact collections are mostly built using semi-supervised relation extraction techniques and wisdom of the crowds methods, rendering them inherently noisy. In this paper, we propose to validate the resulting facts by leveraging global constraints inherent in large fact collections, observing that correct facts will tend to match their arguments with other facts more often than with incorrect ones. We model this intuition as a graph-ranking problem over a fact graph and explore novel random walk algorithms. We present an empirical study, over a large set of facts extracted from a 500 million document webcrawl, validating the model and showing that it improves fact quality over state-of-the-art methods. 1},
added-at = {2012-10-11T15:00:46.000+0200},
author = {Jain, Alpa and Pantel, Patrick},
biburl = {https://www.bibsonomy.org/bibtex/2bbfca0cdc01b5a2e7012a76edafe453d/jil},
booktitle = {In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010},
description = {CiteSeerX — Factrank: Random walks on a web of facts},
interhash = {4fdd3fe107d90df8e15b298e0a0fe499},
intrahash = {bbfca0cdc01b5a2e7012a76edafe453d},
keywords = {extraction fact filtering pagerank ranking relation web},
pages = {501--509},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {Factrank: Random walks on a web of facts},
url = {http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.230.8852},
year = 2010
}