We propose a novel framework to perform classification via deep learning in
the presence of noisy annotations. When trained on noisy labels, deep neural
networks have been observed to first fit the training data with clean labels
during an "early learning" phase, before eventually memorizing the examples
with false labels. We prove that early learning and memorization are
fundamental phenomena in high-dimensional classification tasks, even in simple
linear models, and give a theoretical explanation in this setting. Motivated by
these findings, we develop a new technique for noisy classification tasks,
which exploits the progress of the early learning phase. In contrast with
existing approaches, which use the model output during early learning to detect
the examples with clean labels, and either ignore or attempt to correct the
false labels, we take a different route and instead capitalize on early
learning via regularization. There are two key elements to our approach. First,
we leverage semi-supervised learning techniques to produce target probabilities
based on the model outputs. Second, we design a regularization term that steers
the model towards these targets, implicitly preventing memorization of the
false labels. The resulting framework is shown to provide robustness to noisy
annotations on several standard benchmarks and real-world datasets, where it
achieves results comparable to the state of the art.
Описание
[2007.00151] Early-Learning Regularization Prevents Memorization of Noisy Labels
%0 Journal Article
%1 liu2020earlylearning
%A Liu, Sheng
%A Niles-Weed, Jonathan
%A Razavian, Narges
%A Fernandez-Granda, Carlos
%D 2020
%K learning noise readings regularisation
%T Early-Learning Regularization Prevents Memorization of Noisy Labels
%U http://arxiv.org/abs/2007.00151
%X We propose a novel framework to perform classification via deep learning in
the presence of noisy annotations. When trained on noisy labels, deep neural
networks have been observed to first fit the training data with clean labels
during an "early learning" phase, before eventually memorizing the examples
with false labels. We prove that early learning and memorization are
fundamental phenomena in high-dimensional classification tasks, even in simple
linear models, and give a theoretical explanation in this setting. Motivated by
these findings, we develop a new technique for noisy classification tasks,
which exploits the progress of the early learning phase. In contrast with
existing approaches, which use the model output during early learning to detect
the examples with clean labels, and either ignore or attempt to correct the
false labels, we take a different route and instead capitalize on early
learning via regularization. There are two key elements to our approach. First,
we leverage semi-supervised learning techniques to produce target probabilities
based on the model outputs. Second, we design a regularization term that steers
the model towards these targets, implicitly preventing memorization of the
false labels. The resulting framework is shown to provide robustness to noisy
annotations on several standard benchmarks and real-world datasets, where it
achieves results comparable to the state of the art.
@article{liu2020earlylearning,
abstract = {We propose a novel framework to perform classification via deep learning in
the presence of noisy annotations. When trained on noisy labels, deep neural
networks have been observed to first fit the training data with clean labels
during an "early learning" phase, before eventually memorizing the examples
with false labels. We prove that early learning and memorization are
fundamental phenomena in high-dimensional classification tasks, even in simple
linear models, and give a theoretical explanation in this setting. Motivated by
these findings, we develop a new technique for noisy classification tasks,
which exploits the progress of the early learning phase. In contrast with
existing approaches, which use the model output during early learning to detect
the examples with clean labels, and either ignore or attempt to correct the
false labels, we take a different route and instead capitalize on early
learning via regularization. There are two key elements to our approach. First,
we leverage semi-supervised learning techniques to produce target probabilities
based on the model outputs. Second, we design a regularization term that steers
the model towards these targets, implicitly preventing memorization of the
false labels. The resulting framework is shown to provide robustness to noisy
annotations on several standard benchmarks and real-world datasets, where it
achieves results comparable to the state of the art.},
added-at = {2020-07-16T12:06:11.000+0200},
author = {Liu, Sheng and Niles-Weed, Jonathan and Razavian, Narges and Fernandez-Granda, Carlos},
biburl = {https://www.bibsonomy.org/bibtex/20fd53249e490fa7a9925bd565cadf35f/kirk86},
description = {[2007.00151] Early-Learning Regularization Prevents Memorization of Noisy Labels},
interhash = {79513672290e25bee26f4aa04326ec39},
intrahash = {0fd53249e490fa7a9925bd565cadf35f},
keywords = {learning noise readings regularisation},
note = {cite arxiv:2007.00151},
timestamp = {2020-07-16T12:06:11.000+0200},
title = {Early-Learning Regularization Prevents Memorization of Noisy Labels},
url = {http://arxiv.org/abs/2007.00151},
year = 2020
}