Simple unsupervised grammar induction from raw text with cascaded finite state models
E. Ponvert, J. Baldridge, and K. Erk. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, page 1077--1086. Stroudsburg, PA, USA, Association for Computational Linguistics, (2011)
Abstract
We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.
Description
Simple unsupervised grammar induction from raw text with cascaded finite state models
%0 Conference Paper
%1 ponvert2011
%A Ponvert, Elias
%A Baldridge, Jason
%A Erk, Katrin
%B Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
%C Stroudsburg, PA, USA
%D 2011
%I Association for Computational Linguistics
%K automaton cascade parsing unsupervised
%P 1077--1086
%T Simple unsupervised grammar induction from raw text with cascaded finite state models
%U http://dl.acm.org/citation.cfm?id=2002472.2002608
%X We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.
%@ 978-1-932432-87-9
@inproceedings{ponvert2011,
abstract = {We consider a new subproblem of unsupervised parsing from raw text, unsupervised partial parsing---the unsupervised version of text chunking. We show that addressing this task directly, using probabilistic finite-state methods, produces better results than relying on the local predictions of a current best unsupervised parser, Seginer's (2007) CCL. These finite-state models are combined in a cascade to produce more general (full-sentence) constituent structures; doing so outperforms CCL by a wide margin in unlabeled PARSEVAL scores for English, German and Chinese. Finally, we address the use of phrasal punctuation as a heuristic indicator of phrasal boundaries, both in our system and in CCL.},
acmid = {2002608},
added-at = {2012-11-08T17:59:53.000+0100},
address = {Stroudsburg, PA, USA},
author = {Ponvert, Elias and Baldridge, Jason and Erk, Katrin},
biburl = {https://www.bibsonomy.org/bibtex/2ea0db6f32cded6cc5a59b1413f2be252/jil},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1},
description = {Simple unsupervised grammar induction from raw text with cascaded finite state models},
interhash = {0d8f375024d4c2f8f4e908127f98457f},
intrahash = {ea0db6f32cded6cc5a59b1413f2be252},
isbn = {978-1-932432-87-9},
keywords = {automaton cascade parsing unsupervised},
location = {Portland, Oregon},
numpages = {10},
pages = {1077--1086},
publisher = {Association for Computational Linguistics},
series = {HLT '11},
timestamp = {2013-11-23T20:11:51.000+0100},
title = {Simple unsupervised grammar induction from raw text with cascaded finite state models},
url = {http://dl.acm.org/citation.cfm?id=2002472.2002608},
year = 2011
}