Abstract
This paper explores the task Natural Language Understanding (NLU) by looking
at duplicate question detection in the Quora dataset. We conducted extensive
exploration of the dataset and used various machine learning models, including
linear and tree-based models. Our final finding was that a simple Continuous
Bag of Words neural network model had the best performance, outdoing more
complicated recurrent and attention based models. We also conducted error
analysis and found some subjectivity in the labeling of the dataset.
Users
Please
log in to take part in the discussion (add own reviews or comments).