Inproceedings,

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research.

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and .
ACL (1), page 15725-15788. Association for Computational Linguistics, (2024)

Meta data

Tags

Users

  • @dblp

Comments and Reviews