MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU

(arxiv.org)

273 points | by chrsw 16 hours ago ago

49 comments