@pprett

N-Gram-Based Text Categorization

, und . Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Seite 161--175. Las Vegas, US, (1994)

Zusammenfassung

Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We...

Links und Ressourcen

Tags

Community