Article,

Predicting the impact of scientific concepts using full-text features

K. McKeown, H. Daume, S. Chaturvedi, J. Paparrizos, K. Thadani, P. Barrio, O. Biran, S. Bothe, M. Collins, K. Fleischmann, L. Gravano, R. Jha, B. King, K. McInerney, T. Moon, A. Neelakantan, D. O'Seaghdha, D. Radev, C. Templeton, and S. Teufel.
Journal of the Association for Information Science and Technology, (2016)
DOI: 10.1002/asi.23612

Abstract

New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate predictions result from combining the metadata and full-text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.

BibTeX key: ASI:ASI23612
entry type: article
year: 2016
journal: Journal of the Association for Information Science and Technology
pages: n/a--n/a
issn: 2330-1643
DOI: 10.1002/asi.23612
url: http://dx.doi.org/10.1002/asi.23612

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

%0 Journal Article %1 ASI:ASI23612 %A McKeown, Kathy %A Daume, Hal %A Chaturvedi, Snigdha %A Paparrizos, John %A Thadani, Kapil %A Barrio, Pablo %A Biran, Or %A Bothe, Suvarna %A Collins, Michael %A Fleischmann, Kenneth R. %A Gravano, Luis %A Jha, Rahul %A King, Ben %A McInerney, Kevin %A Moon, Taesun %A Neelakantan, Arvind %A O'Seaghdha, Diarmuid %A Radev, Dragomir %A Templeton, Clay %A Teufel, Simone %D 2016 %J Journal of the Association for Information Science and Technology %K concepts full-text machine_learning natural_language_processing prediction scientific_concepts scientometrics time-series_analysis %P n/a--n/a %R 10.1002/asi.23612 %T Predicting the impact of scientific concepts using full-text features %U http://dx.doi.org/10.1002/asi.23612 %X New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate predictions result from combining the metadata and full-text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.

@article{ASI:ASI23612, abstract = {New scientific concepts, interpreted broadly, are continuously introduced in the literature, but relatively few concepts have a long-term impact on society. The identification of such concepts is a challenging prediction task that would help multiple parties—including researchers and the general public—focus their attention within the vast scientific literature. In this paper we present a system that predicts the future impact of a scientific concept, represented as a technical term, based on the information available from recently published research articles. We analyze the usefulness of rich features derived from the full text of the articles through a variety of approaches, including rhetorical sentence analysis, information extraction, and time-series analysis. The results from two large-scale experiments with 3.8 million full-text articles and 48 million metadata records support the conclusion that full-text features are significantly more useful for prediction than metadata-only features and that the most accurate predictions result from combining the metadata and full-text features. Surprisingly, these results hold even when the metadata features are available for a much larger number of documents than are available for the full-text features.}, added-at = {2016-01-18T01:52:14.000+0100}, author = {McKeown, Kathy and Daume, Hal and Chaturvedi, Snigdha and Paparrizos, John and Thadani, Kapil and Barrio, Pablo and Biran, Or and Bothe, Suvarna and Collins, Michael and Fleischmann, Kenneth R. and Gravano, Luis and Jha, Rahul and King, Ben and McInerney, Kevin and Moon, Taesun and Neelakantan, Arvind and O'Seaghdha, Diarmuid and Radev, Dragomir and Templeton, Clay and Teufel, Simone}, biburl = {https://www.bibsonomy.org/bibtex/2ddf6d0d98507ae113cf67e950e370c2d/hangdong}, doi = {10.1002/asi.23612}, interhash = {7724613ac70c4aa1ae0dec4be1a832d8}, intrahash = {ddf6d0d98507ae113cf67e950e370c2d}, issn = {2330-1643}, journal = {Journal of the Association for Information Science and Technology}, keywords = {concepts full-text machine_learning natural_language_processing prediction scientific_concepts scientometrics time-series_analysis}, pages = {n/a--n/a}, timestamp = {2016-01-18T01:52:14.000+0100}, title = {Predicting the impact of scientific concepts using full-text features}, url = {http://dx.doi.org/10.1002/asi.23612}, year = 2016 }

BibSonomy

Predicting the impact of scientific concepts using full-text features

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on