In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
In this post you will see 5 recipes of supervised classification algorithms applied to small standard datasets that are provided with the scikit-learn library.
P. Eklund, and S. Kirkby. Proceedings of the 1st Pacific Asian Conference on Knowledge Discovery and Data Mining, page 112-123. World Scientific, (1997)
S. Bloehdorn, and A. Hotho. Proceedings of the Fourth IEEE International Conference on Data Mining, page 331-334. IEEE Computer Society Press, (November 2004)
S. Bloehdorn, and A. Hotho. Proceedings of the MSW 2004 workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, page 70-87. (August 2004)