@beate

Relaxed online SVMs for spam filtering

, and . Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, page 415--422. New York, NY, USA, ACM, (2007)
DOI: 10.1145/1277741.1277813

Abstract

Spam is a key problem in electronic communication, including large-scale email systems and the growing number of blogs. Content-based filtering is one reliable method of combating this threat in its various forms, but some academic researchers and industrial practitioners disagree on how best to filter spam. The former have advocated the use of Support Vector Machines (SVMs) for content-based filtering, as this machine learning methodology gives state-of-the-art performance for text classification. However, similar performance gains have yet to be demonstrated for online spam filtering. Additionally, practitioners cite the high cost of SVMs as reason to prefer faster (if less statistically robust) Bayesian methods. In this paper, we offer a resolution to this controversy. First, we show that online SVMs indeed give state-of-the-art classification performance on online spam filtering on large benchmark data sets. Second, we show that nearly equivalent performance may be achieved by a Relaxed Online SVM (ROSVM) at greatly reduced computational cost. Our results are experimentally verified on email spam, blog spam, and splog detection tasks.

Description

Relaxed online SVMs for spam filtering

Links and resources

Tags

community

  • @chato
  • @beate
  • @dblp
  • @khilgenberg
@beate's tags highlighted