From post

копировать удалить добавить публикацию в буфер
Запись сообщества
посмотреть историю данной записи
URL
DOI
BibTeX
EndNote
APA
Chicago
DIN 1505
Harvard
MSOffice XML

MANSA: Learning Fast and Slow in Multi-Agent Systems.

D. Mguni, T. Jafferjee, H. Chen, J. Wang, L. Fei, X. Feng, S. McAleer, F. Tong, J. Wang, и Y. Yang. CoRR, (2023)

Please choose a person to relate this publication to

To differ between persons with the same name, the academic degree and the title of an important publication will be displayed.

Stephen Wagner

Andreas Stephen

Stephen Novak

Stephen Riedel

Stephen Stürzenbaum

Другие публикации лиц с тем же именем

Llemma: An Open Language Model For Mathematics.Z. Azerbayev, H. Schoelkopf, K. Paster, M. Santos, S. McAleer, A. Jiang, J. Deng, S. Biderman, и S. Welleck. CoRR, (2023)MANSA: Learning Fast and Slow in Multi-Agent Systems.D. Mguni, T. Jafferjee, H. Chen, J. Wang, L. Fei, X. Feng, S. McAleer, F. Tong, J. Wang, и Y. Yang. CoRR, (2023)Solving the Rubik's Cube Without Human KnowledgeS. McAleer, F. Agostinelli, A. Shmakov, и P. Baldi. (2018)cite arxiv:1805.07470Comment: First three authors contributed equally. Submitted to NIPS 2018.Faster Game Solving via Hyperparameter Schedules.N. Zhang, S. McAleer, и T. Sandholm. CoRR, (2024)Grasper: A Generalist Pursuer for Pursuit-Evasion Problems.P. Li, S. Li, X. Wang, J. Cerný, Y. Zhang, S. McAleer, H. Chan, и B. An. AAMAS, стр. 1147-1155. International Foundation for Autonomous Agents and Multiagent Systems / ACM, (2024)Sequential Decision Making in Single-Agent and Multi-Agent DomainsS. McAleer. University of California, Irvine, USA, (2022)Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning.J. Pérolat, B. Vylder, D. Hennes, E. Tarassov, F. Strub, V. de Boer, P. Muller, J. Connor, N. Burch, T. Anthony и 24 other автор(ы). CoRR, (2022)Solving the Rubik's Cube with Approximate Policy Iteration.S. McAleer, F. Agostinelli, A. Shmakov, и P. Baldi. ICLR (Poster), OpenReview.net, (2019)Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games.S. McAleer, J. Lanier, K. Wang, P. Baldi, R. Fox, и T. Sandholm. CoRR, (2022)Confronting Reward Model Overoptimization with Constrained RLHF.T. Moskovitz, A. Singh, D. Strouse, T. Sandholm, R. Salakhutdinov, A. Dragan, и S. McAleer. CoRR, (2023)

Что такое BibSonomy?: С чего начать; Кнопки для браузера; Помощь
Разработчикам: Обзор; API-документация

Контакт и защита личных данных: о нас; Cookies; Сообщить о проблеме; BibSonomy Вики

Интеграция: PUMA; Расширение для TYPO3; Плагин для; Клиент Java REST; Поддерживаемые источники; далее

О BibSonomy: Команда; Блог; Список рассылки
Социальные сети: Наш Twitter