From post

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

 

Другие публикации лиц с тем же именем

What Language Model Architecture and Pretraining Objective Works Best for Zero-Shot Generalization?, , , , , , , и . ICML, том 162 из Proceedings of Machine Learning Research, стр. 22964-22984. PMLR, (2022)Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?, , , , , , , , , и . EMNLP (Findings), стр. 12342-12364. Association for Computational Linguistics, (2023)UniMax: Fairer and More Effective Language Sampling for Large-Scale Multilingual Pretraining., , , , , , и . ICLR, OpenReview.net, (2023)Language models are multilingual chain-of-thought reasoners., , , , , , , , , и 2 other автор(ы). ICLR, OpenReview.net, (2023)Charformer: Fast Character Transformers via Gradient-based Subword Tokenization., , , , , , , , , и . CoRR, (2021)Mixture-of-Experts Meets Instruction Tuning: A Winning Combination for Large Language Models., , , , , , , , , и 10 other автор(ы). ICLR, OpenReview.net, (2024)Charformer: Fast Character Transformers via Gradient-based Subword Tokenization., , , , , , , , , и . ICLR, OpenReview.net, (2022)Scale Efficiently: Insights from Pretraining and Finetuning Transformers., , , , , , , , , и . ICLR, OpenReview.net, (2022)Transcending Scaling Laws with 0.1% Extra Compute., , , , , , , , , и 6 other автор(ы). EMNLP, стр. 1471-1486. Association for Computational Linguistics, (2023)UL2: Unifying Language Learning Paradigms., , , , , , , , , и 3 other автор(ы). ICLR, OpenReview.net, (2023)