Zusammenfassung
Mathematically characterizing the implicit regularization induced by
gradient-based optimization is a longstanding pursuit in the theory of deep
learning. A widespread hope is that a characterization based on minimization of
norms may apply, and a standard test-bed for studying this prospect is matrix
factorization (matrix completion via linear neural networks). It is an open
question whether norms can explain the implicit regularization in matrix
factorization. The current paper resolves this open question in the negative,
by proving that there exist natural matrix factorization problems on which the
implicit regularization drives all norms (and quasi-norms) towards infinity.
Our results suggest that, rather than perceiving the implicit regularization
via norms, a potentially more useful interpretation is minimization of rank. We
demonstrate empirically that this interpretation extends to a certain class of
non-linear neural networks, and hypothesize that it may be key to explaining
generalization in deep learning.
Nutzer