I am the project leader, designer and main developer of Hetu-Galvatron. Hetu-Galvatron is a high-performance automatic distributed training system designed for Transformer models, including Large Language Models (LLMs). It is developed independently and open-sourced by PKU-DAIR Lab. It leverages advanced automatic parallelism techniques to deliver exceptional training efficiency, which has the following key features:
Hetu-Galvatron has been applied in many billion-scale DL tasks, and has been deployed by leading industrial companies, including Huawei and ZTE, to accelerate training for LLMs with up to more than 100B parameters. We are also collaborating with more companies, such as ByteDance and Baidu, to expand its use. We believe Hetu-Galvatron is well-suited for both research and industrial applications, and we welcome interested enterprises to reach out to us for potential cooperation.
We welcome everyone interested in efficient distributed training techniques and parallelism optimization to use Hetu-Galvatron, and contribute codes, create issues or pull requests to Hetu-Galvatron. Read the README and the Document of Hetu-Galvatron for detailed usage guidance.
I am a core developer of Hetu. Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed and open-sourced by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:
Hetu is honored to be awarded AI China 2021 Top-10 Open Source Events (AI中国机器之心2021年度开源事件Top-10).
We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Hetu Contribution Guide for more details.