Author of the publication

Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models.

, , , and . ICML, volume 202 of Proceedings of Machine Learning Research, page 22188-22214. PMLR, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed. You can also use the button next to the name to display some publications already assigned to the person.

 

Other publications of authors with the same name

A practical framework for predicting residential indoor PM2. 5 concentration using land-use regression and machine learning methods, , , , , , and . Chemosphere, (2021)The role of over-parametrization in generalization of neural networks, , , , and . International Conference on Learning Representations, (2019)Harnessing the Power of Infinitely Wide Deep Nets on Small-data Tasks., , , , , and . ICLR, OpenReview.net, (2020)Multiple Pedestrian Tracking With Graph Attention Map on Urban Road Scene., , , , and . IEEE Trans. Intell. Transp. Syst., 24 (8): 8567-8579 (August 2023)Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning., , and . CoRR, (2020)A Novel Input Stage Based on DTMOS for Low-Voltage Low-Noise Operational Amplifier., , and . APCCAS, page 1591-1594. IEEE, (2006)Interprocedural Analysis Based on Guarded Array Regions., , and . Compiler Optimizations for Scalable Parallel Systems Languages, volume 1808 of Lecture Notes in Computer Science, page 221-246. Springer, (2001)Hyper-parameter Tuning of Federated Learning Based on Particle Swarm Optimization., , and . CCIS, page 99-103. IEEE, (2021)On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)., , and . NeurIPS, page 12712-12725. (2021)Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction., , and . NeurIPS, (2022)