Article,

Should We Use the Sample? Analyzing Datasets Sampled from Twitter&Rsquo;s Stream API

Y. Wang, J. Callan, and B. Zheng.
ACM Trans. Web, 9 (3): 13:1--13:23 (June 2015)
DOI: 10.1145/2746366

Abstract

Researchers have begun studying content obtained from microblogging services such as Twitter to address a variety of technological, social, and commercial research questions. The large number of Twitter users and even larger volume of tweets often make it impractical to collect and maintain a complete record of activity; therefore, most research and some commercial software applications rely on samples, often relatively small samples, of Twitter data. For the most part, sample sizes have been based on availability and practical considerations. Relatively little attention has been paid to how well these samples represent the underlying stream of Twitter data. To fill this gap, this article performs a comparative analysis on samples obtained from two of Twitter’s streaming APIs with a more complete Twitter dataset to gain an in-depth understanding of the nature of Twitter data samples and their potential for use in various data mining tasks.

BibTeX key: Wang:2015:WUS:2788341.2746366
entry type: article
address: New York, NY, USA
year: 2015
month: jun
journal: ACM Trans. Web
number: 3
pages: 13:1--13:23
publisher: ACM
volume: 9
acmid: 2746366
issn: 1559-1131
issue_date: June 2015
numpages: 23
articleno: 13
DOI: 10.1145/2746366
url: http://doi.acm.org/10.1145/2746366

BibSonomy

Should We Use the Sample? Analyzing Datasets Sampled from Twitter&Rsquo;s Stream API

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on