Abstract
Semi-supervised learning (SSL) provides a powerful framework for leveraging
unlabeled data when labels are limited or expensive to obtain. SSL algorithms
based on deep neural networks have recently proven successful on standard
benchmark tasks. However, we argue that these benchmarks fail to address many
issues that these algorithms would face in real-world applications. After
creating a unified reimplementation of various widely-used SSL techniques, we
test them in a suite of experiments designed to address these issues. We find
that the performance of simple baselines which do not use unlabeled data is
often underreported, that SSL methods differ in sensitivity to the amount of
labeled and unlabeled data, and that performance can degrade substantially when
the unlabeled dataset contains out-of-class examples. To help guide SSL
research towards real-world applicability, we make our unified reimplemention
and evaluation platform publicly available.
Users
Please
log in to take part in the discussion (add own reviews or comments).