Megatron-LLM enables pre-training and fine-tuning of large language models (LLMs) at scale. It supports architectures like Llama, Llama 2, Code Llama, Falcon, and Mistral. The library allows training of large models (up to 70B parameters) on commodity hardware using tensor, pipeline, and data parallelism. It provides features like grouped-query attention, rotary position embeddings, BF16/FP16 training, and integration with Hugging Face and WandB.
This page was last edited on 2024-04-12.
This page was last edited on 2024-04-12.