Showcase - Yujie Wang

Hetu-Galvatron

I am the project leader, designer and main developer of Hetu-Galvatron. Hetu-Galvatron is a high-performance automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). It is developed independently and open-sourced by PKU-DAIR Lab. It leverages advanced automatic parallelism techniques to deliver exceptional training efficiency, which has the following key features:

Enhanced Efficiency via Automatic Parallelism
- Enlarged Parallelism Search Space
  Incorporate multiple popular parallelism dimensions of distributed training, including:
  - DP (Data Parallelism)
  - SDP (Sharded Data Parallelism, support both ZeRO-1 & ZeRO-2 & ZeRO-3)
  - PP (Pipeline Parallelism, support both GPipe & Pipedream-flush / 1F1B-flush)
  - TP (Tensor Parallelism)
  - CKPT (Activation Checkpointing) as a special parallelism dimension
- Fine-grained Hybrid Parallelism
  For each Transformer layer, support flexible and fine-grained hybrid parallelism strategies, contributing to the enhanced training efficiency.
- Efficient Automatic Parallelism Optimization
  For any given Transformer model, automatically and efficiently search for the optimal parallelism strategies, which provides the optimal training efficiency.
Workload Versatility
Suitable for a wide range of Transformer architectures, including language models, LLMs, vision models, multi-modality models, etc.
User-Friendly Interface
Hetu-Galvatron is easy to use, even for those new to distributed training.

Hetu-Galvatron has been applied in many billion-scale DL tasks, and has been deployed by leading industrial companies, including Huawei and ZTE, to accelerate training for LLMs with up to more than 100B parameters. We are also collaborating with more companies, such as ByteDance and Baidu, to expand its use. We believe Hetu-Galvatron is well-suited for both research and industrial applications, and we welcome interested enterprises to reach out to us for potential cooperation.

We welcome everyone interested in efficient distributed training techniques and parallelism optimization to use Hetu-Galvatron, and contribute codes, create issues or pull requests to Hetu-Galvatron. Read the README and the Document of Hetu-Galvatron for detailed usage guidance.

Give a star!

Hetu

I am a core developer of Hetu. Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed and open-sourced by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:

Applicability
DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.
Efficiency
Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.
Flexibility
Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.
Scalability
Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark.
Agility
Automatically ML pipeline: feature engineering, model selection, hyperparameter search.

Hetu is honored to be awarded AI China 2021 Top-10 Open Source Events (AI中国机器之心2021年度开源事件Top-10).

We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Hetu Contribution Guide for more details.

Give a star!