@dblp

InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding.

, , , , , , , , , , and . CoRR, (2024)

Links and resources

Tags