Abstract

Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We...

Links and resources

Tags

community

  • @edaehn
  • @syslogd
  • @marcelkiesel
  • @chato
  • @jabreftest
  • @nosebrain
  • @mortimer_m8
  • @msn
  • @dbenz
  • @jil
  • @pprett
  • @lopusz_kdd
@mortimer_m8's tags highlighted