M. Brennan, and G. Bresler. (2020)cite arxiv:2005.08099Comment: 175 pages; subsumes preliminary draft arXiv:1908.06130; accepted for presentation at the Conference on Learning Theory (COLT) 2020.
A. Achille, and S. Soatto. (2017)cite arxiv:1706.01350Comment: Deep learning, neural network, representation, flat minima, information bottleneck, overfitting, generalization, sufficiency, minimality, sensitivity, information complexity, stochastic gradient descent, regularization, total correlation, PAC-Bayes.