Tweets2011
As part of the TREC 2011 microblog track, Twitter provided identifiers for approximately 16 million tweets sampled between January 23rd and February 8th, 2011. The corpus is designed to be a reusable, representative sample of the twittersphere - i.e. both important and spam tweets are included.
A number of resources have been compiled within the context of the MuchMore project. These include: a bilingual, parallel medical corpus; corresponding queries and relevance assessments; evaluation sets of disambiguated terms for GermaNet and UMLS; an evaluation list for morphological decomposition of medical terms.
A. Halevy, and J. Madhavan. IJCAI-03, Proceedings of the Eighteenth International Joint Conference
on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003, page 1567-1572. Morgan Kaufmann, (2003)