Megatron-LLM

Megatron-LLM

Large language model training library

Megatron-LLM enables pre-training and fine-tuning of large language models (LLMs) at scale. It supports architectures like Llama, Llama 2, Code Llama, Falcon, and Mistral. The library allows training of large models (up to 70B parameters) on commodity hardware using tensor, pipeline, and data parallelism. It provides features like grouped-query attention, rotary position embeddings, BF16/FP16 training, and integration with Hugging Face and WandB.

Machine LearningNatural Language
Key facts
Maturity
Support
C4DT
Inactive
Lab
Active
  • Technical

Machine Learning and Optimization Laboratory

Machine Learning and Optimization Laboratory
Martin Jaggi

Prof. Martin Jaggi

The Machine Learning and Optimization Laboratory is interested in machine learning, optimization algorithms and text understanding, as well as several application domains.

This page was last edited on 2024-04-12.