Author of the publication

The Implicit Bias of Gradient Descent on Separable Data

, , , , and . (2017)cite arxiv:1710.10345Comment: Final JMLR version, with improved discussions over v3. Main improvements in journal version over conference version (v2 appeared in ICLR): We proved the measure zero case for main theorem (with implications for the rates), and the multi-class case.

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

Exponentially vanishing sub-optimal local minima in multilayer neural networks., and . ICLR (Workshop), OpenReview.net, (2018)Regularization Guarantees Generalization in Bayesian Reinforcement Learning through Algorithmic Stability., , and . AAAI, page 8423-8431. AAAI Press, (2022)Physics-Aware Downsampling with Deep Learning for Scalable Flood Modeling., , , , and . NeurIPS, page 1378-1389. (2021)Accurate Post Training Quantization With Small Calibration Sets., , , , and . ICML, volume 139 of Proceedings of Machine Learning Research, page 4466-4475. PMLR, (2021)Scaling FP8 training to trillion-token LLMs., , , and . CoRR, (2024)The Implicit Bias of Gradient Descent on Separable Data., , , , and . J. Mach. Learn. Res., (2018)Bayesian Gradient Descent: Online Variational Bayes Learning with Increased Robustness to Catastrophic Forgetting and Weight Pruning, , , and . (2018)cite arxiv:1803.10123.How do Minimum-Norm Shallow Denoisers Look in Function Space?, , , , and . CoRR, (2023)A Mean Field Theory of Quantized Deep Networks: The Quantization-Depth Trade-Off., , and . NeurIPS, page 7036-7046. (2019)Train longer, generalize better: closing the generalization gap in large batch training of neural networks., , and . NIPS, page 1731-1741. (2017)