Inproceedings,

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

L. Soldaini, R. Kinney, A. Bhagia, D. Schwenk, D. Atkinson, R. Authur, B. Bogin, K. Chandu, J. Dumas, Y. Elazar, V. Hofmann, A. Jha, S. Kumar, L. Lucy, X. Lyu, N. Lambert, I. Magnusson, J. Morrison, N. Muennighoff, A. Naik, C. Nam, M. Peters, A. Ravichander, K. Richardson, Z. Shen, E. Strubell, N. Subramani, O. Tafjord, E. Walsh, L. Zettlemoyer, N. Smith, H. Hajishirzi, I. Beltagy, D. Groeneveld, J. Dodge, and K. Lo.
ACL (1), page 15725-15788. Association for Computational Linguistics, (2024)

Meta data

BibTeX key: conf/acl/SoldainiKBSAABC24
entry type: inproceedings
booktitle: ACL (1)
year: 2024
pages: 15725-15788
publisher: Association for Computational Linguistics
crossref: conf/acl/2024-1
ee: https://aclanthology.org/2024.acl-long.840
isbn: 979-8-89176-094-3
url: http://dblp.uni-trier.de/db/conf/acl/acl2024-1.html#SoldainiKBSAABC24

Tags

dblp

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

search on