X. Chen, and K. He. (2020)cite arxiv:2011.10566Comment: Technical report, 10 pages.
Abstract
Siamese networks have become a common structure in various recent models for
unsupervised visual representation learning. These models maximize the
similarity between two augmentations of one image, subject to certain
conditions for avoiding collapsing solutions. In this paper, we report
surprising empirical results that simple Siamese networks can learn meaningful
representations even using none of the following: (i) negative sample pairs,
(ii) large batches, (iii) momentum encoders. Our experiments show that
collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing. We provide a
hypothesis on the implication of stop-gradient, and further show
proof-of-concept experiments verifying it. Our "SimSiam" method achieves
competitive results on ImageNet and downstream tasks. We hope this simple
baseline will motivate people to rethink the roles of Siamese architectures for
unsupervised representation learning. Code will be made available.
%0 Generic
%1 chen2020exploring
%A Chen, Xinlei
%A He, Kaiming
%D 2020
%K cs.CV cs.LG
%T Exploring Simple Siamese Representation Learning
%U http://arxiv.org/abs/2011.10566
%X Siamese networks have become a common structure in various recent models for
unsupervised visual representation learning. These models maximize the
similarity between two augmentations of one image, subject to certain
conditions for avoiding collapsing solutions. In this paper, we report
surprising empirical results that simple Siamese networks can learn meaningful
representations even using none of the following: (i) negative sample pairs,
(ii) large batches, (iii) momentum encoders. Our experiments show that
collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing. We provide a
hypothesis on the implication of stop-gradient, and further show
proof-of-concept experiments verifying it. Our "SimSiam" method achieves
competitive results on ImageNet and downstream tasks. We hope this simple
baseline will motivate people to rethink the roles of Siamese architectures for
unsupervised representation learning. Code will be made available.
@misc{chen2020exploring,
abstract = {Siamese networks have become a common structure in various recent models for
unsupervised visual representation learning. These models maximize the
similarity between two augmentations of one image, subject to certain
conditions for avoiding collapsing solutions. In this paper, we report
surprising empirical results that simple Siamese networks can learn meaningful
representations even using none of the following: (i) negative sample pairs,
(ii) large batches, (iii) momentum encoders. Our experiments show that
collapsing solutions do exist for the loss and structure, but a stop-gradient
operation plays an essential role in preventing collapsing. We provide a
hypothesis on the implication of stop-gradient, and further show
proof-of-concept experiments verifying it. Our "SimSiam" method achieves
competitive results on ImageNet and downstream tasks. We hope this simple
baseline will motivate people to rethink the roles of Siamese architectures for
unsupervised representation learning. Code will be made available.},
added-at = {2021-11-01T10:28:21.000+0100},
author = {Chen, Xinlei and He, Kaiming},
biburl = {https://www.bibsonomy.org/bibtex/21d0fd5031648559c665a9c9f9cf683b3/aerover},
description = {Exploring Simple Siamese Representation Learning},
interhash = {d8eebac6ebfa2ec3686d5535fccd5eee},
intrahash = {1d0fd5031648559c665a9c9f9cf683b3},
keywords = {cs.CV cs.LG},
note = {cite arxiv:2011.10566Comment: Technical report, 10 pages},
timestamp = {2021-11-01T10:28:21.000+0100},
title = {Exploring Simple Siamese Representation Learning},
url = {http://arxiv.org/abs/2011.10566},
year = 2020
}