J. Wu, K. Williams, H. Chen, M. Khabsa, C. Caragea, A. Ororbia, D. Jordan, and C. Giles. Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, Québec, Canada, page 2930--2937. Association for the Advancement of Artificial Intelligence, (July 2014)
Abstract
CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX, e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, Québec, Canada
year
2014
month
#jul#
organization
Association for the Advancement of Artificial Intelligence
pages
2930--2937
file
AAAI Digital Library:2014/WuWilliamsEtAl14IAAI.pdf:PDF
%0 Conference Paper
%1 WuWilliamsEtAl14IAAI
%A Wu, Jian
%A Williams, Kyle
%A Chen, Hung-Hsuan
%A Khabsa, Madian
%A Caragea, Cornelia
%A Ororbia, Alexander
%A Jordan, Douglas
%A Giles, C. Lee
%B Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, Québec, Canada
%D 2014
%K 01801 aaai paper ai web application publications information retrieval search
%P 2930--2937
%T CiteSeerX: AI in a Digital Library Search Engine
%U http://www.aaai.org/ocs/index.php/IAAI/IAAI14/paper/view/8607
%X CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX, e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.
@inproceedings{WuWilliamsEtAl14IAAI,
abstract = {CiteSeerX is a digital library search engine that provides access to more than 4 million academic documents with nearly a million users and millions of hits per day. Artificial intelligence (AI) technologies are used in many components of CiteSeerX, e.g. to accurately extract metadata, intelligently crawl the web, and ingest documents. We present key AI technologies used in the following components: document classification and deduplication, document and citation clustering, automatic metadata extraction and indexing, and author disambiguation. These AI technologies have been developed by CiteSeerX group members over the past 5-6 years. We also show the usage status, payoff, development challenges, main design concepts, and deployment and maintenance requirements. While it is challenging to rebuild a system like CiteSeerX from scratch, many of these AI technologies are transferable to other digital libraries and/or search engines.},
added-at = {2018-04-18T16:50:47.000+0200},
author = {Wu, Jian and Williams, Kyle and Chen, Hung-Hsuan and Khabsa, Madian and Caragea, Cornelia and Ororbia, Alexander and Jordan, Douglas and Giles, C. Lee},
biburl = {https://www.bibsonomy.org/bibtex/2ac3777190b56c50b479b2dcda6857520/flint63},
booktitle = {Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence and the Twenty-Sixth Innovative Applications of Artificial Intelligence Conference, Qu{\'e}bec, Canada},
file = {AAAI Digital Library:2014/WuWilliamsEtAl14IAAI.pdf:PDF},
groups = {public},
interhash = {0da5b6413e9e441610015c251a2f2074},
intrahash = {ac3777190b56c50b479b2dcda6857520},
keywords = {01801 aaai paper ai web application publications information retrieval search},
month = {#jul#},
organization = {Association for the Advancement of Artificial Intelligence},
pages = {2930--2937},
timestamp = {2018-04-18T16:50:47.000+0200},
title = {CiteSeerX: {AI} in a Digital Library Search Engine},
url = {http://www.aaai.org/ocs/index.php/IAAI/IAAI14/paper/view/8607},
username = {flint63},
year = 2014
}