the data here is useful for testing classification / clustering, and the accuracy of indexing techniques. However the datasets are too small to make claims about the efficiency of indexing.
My diploma thesis about a system to automatically build a multilingual thesaurus from wikipedia, "WikiWord", is finally done. I handed it in yesterday. My research will hopefully help to make Wikipedia more accessible for automatic processing
English translation of selected chapters of the WikiWord thesis "Automatischer Aufbau eines multilingualen Thesaurus durch Extraktion semantischer und lexikalischer Relationen aus der Wikipedia" by Daniel Kinzler. Translation by the author.
G. Krempl, D. Bodnar, и A. Hrubos. Advances in Intelligent Data Analysis XIV - 14th Int. Symposium, IDA 2015, St. Etienne, France, том 9385 из Lecture Notes in Computer Science, стр. XXII--XXIII. Springer, (2015)
S. Wu, J. Hofman, W. Mason, и D. Watts. Proceedings of the 20th international conference on World wide web, стр. 705--714. New York, NY, USA, ACM, (2011)
J. Hopcroft, T. Lou, и J. Tang. Proceedings of the 20th ACM International Conference on Information and Knowledge Management, стр. 1137--1146. New York, NY, USA, ACM, (2011)
C. Hoede, и L. Zhang. Proceedings of the 9th International Conference on Conceptual Structures (ICCS 2001), том 2120 из Lecture Notes in Computer Science, стр. 15-28. Springer, (2001)