Abstract
We introduce a new general framework for the recognition of complex
visual scenes, which is motivated by biology: We
describe a hierarchical system that closely follows the organization
of visual cortex and builds an increasingly complex and invariant
feature representation by alternating between a template matching
and a maximum pooling operation. We demonstrate the strength of
the approach on a range of recognition tasks: From invariant single
object recognition in clutter to multiclass categorization problems
and complex scene understanding tasks that rely on the recognition
of both shape-based as well as texture-based objects. Given the
biological constraints that the system had to satisfy, the approach
performs surprisingly well: It has the capability of learning from
only a
few training examples and competes with state-of-the-art systems.
We also discuss the existence of a universal, redundant dictionary
of features that could handle the recognition of most object categories.
In addition to its relevance for computer vision, the success of
this approach suggests a plausibility proof for a class of feedforward
models of object recognition in cortex.
Users
Please
log in to take part in the discussion (add own reviews or comments).