There are currently few datasets appropriate for training and evaluating models for non-goal-oriented dialogue systems (chatbots); and equally problematic, there is currently no standard procedure for evaluating such models beyond the classic Turing test.
The aim of our competition is therefore to establish a concrete scenario for testing chatbots that aim to engage humans, and become a standard evaluation tool in order to make such systems directly comparable.
The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks.
J. Yamagishi, T. Nose, H. Zen, T. Toda, and K. Tokuda. Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), page 3957-3960. Las Vegas, NV, USA, (March 2008)
R. Jäschke, A. Hotho, F. Mitzlaff, and G. Stumme. Recommender Systems for the Social Web, volume 32 of Intelligent Systems Reference Library, Springer, Berlin/Heidelberg, (2012)