Tweets2011
As part of the TREC 2011 microblog track, Twitter provided identifiers for approximately 16 million tweets sampled between January 23rd and February 8th, 2011. The corpus is designed to be a reusable, representative sample of the twittersphere - i.e. both important and spam tweets are included.
A social database about things you know and love, spanning millions of topics in thousands of categories. Explore Freebase, add to it, or build applications with it.
X. Wang, Z. Wang, X. Han, W. Jiang, R. Han, Z. Liu, J. Li, P. Li, Y. Lin, and J. Zhou. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 1652--1671. Online, Association for Computational Linguistics, (November 2020)
O. Kashefi, and R. Hwa. Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020), page 200--208. Online, Association for Computational Linguistics, (November 2020)
R. Bommasani, and C. Cardie. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), page 8075--8096. Online, Association for Computational Linguistics, (November 2020)