the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing.
My diploma thesis about a system to automatically build a multilingual thesaurus from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My research will hopefully help to make Wikipedia more accessible for automatic processing
English translation of selected chapters of the WikiWord thesis "Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia" by Daniel Kinzler. Translation by the author.
A. Akyol, Y. Yaslan, и O. Erol. Proceedings of the 9th European Conference on Symbolic
and Quantitative Approaches to Reasoning with
Uncertainty, ECSQARU, том 4724 из Lecture Notes in Computer Science, стр. 878--888. Hammamet, Tunisia, Springer, (октября 2007)
S. Dornbush, A. Joshi, Z. Segall, и T. Oates. Proceeding of the 2007 conference on Advances in Ambient Intelligence, стр. 107--122. Amsterdam, The Netherlands, The Netherlands, IOS Press, (2007)
C. Rose, A. Roque, D. Bhembe, и K. VanLehn. Proceedings of the HLT-NAACL 03 workshop on Building educational applications using natural language processing - Volume 2, стр. 68--75. Stroudsburg, PA, USA, Association for Computational Linguistics, (2003)