Abstract
Representation learning promises to unlock deep learning for the long tail of
vision tasks without expansive labelled datasets. Yet, the absence of a unified
yardstick to evaluate general visual representations hinders progress. Many
sub-fields promise representations, but each has different evaluation protocols
that are either too constrained (linear classification), limited in scope
(ImageNet, CIFAR, Pascal-VOC), or only loosely related to representation
quality (generation). We present the Visual Task Adaptation Benchmark (VTAB): a
diverse, realistic, and challenging benchmark to evaluate representations. VTAB
embodies one principle: good representations adapt to unseen tasks with few
examples. We run a large VTAB study of popular algorithms, answering questions
like: How effective are ImageNet representation on non-standard datasets? Are
generative models competitive? Is self-supervision useful if one already has
labels?
Users
Please
log in to take part in the discussion (add own reviews or comments).