From post

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

Minimax Bounds on Stochastic Batched Convex Optimization., , и . COLT, том 75 из Proceedings of Machine Learning Research, стр. 3065-3162. PMLR, (2018)Large Catapults in Momentum Gradient Descent with Warmup: An Empirical Study., , , , и . CoRR, (2023)Are Transformers universal approximators of sequence-to-sequence functions?, , , , и . ICLR, OpenReview.net, (2020)Does SGD really happen in tiny subspaces?, , и . CoRR, (2024)Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory., и . CoRR, (2023)Minimum Width for Universal Approximation., , , и . ICLR, OpenReview.net, (2021)Small ReLU networks are powerful memorizers: a tight analysis of memorization capacity., , и . NeurIPS, стр. 15532-15543. (2019)On the Training Instability of Shuffling SGD with Batch Normalization., , и . ICML, том 202 из Proceedings of Machine Learning Research, стр. 37787-37845. PMLR, (2023)Provable Memorization via Deep Neural Networks using Sub-linear Parameters., , , и . COLT, том 134 из Proceedings of Machine Learning Research, стр. 3627-3661. PMLR, (2021)Linear attention is (maybe) all you need (to understand Transformer optimization)., , , , , и . ICLR, OpenReview.net, (2024)