We propose a set of criteria to evaluate the quality of studies that have used NLP, focusing on the methods of sample selection, coding, the gold standard, algorithm training, algorithm testing and measures of accuracy (such as recall and precision).