Release of “Fugaku-LLM” — a large language model trained on the supercomputer “Fugaku”

Enhanced Japanese language ability, for use in research and business

Today, Fujitsu announced the release of Fugaku-LLM, a state-of-the-art large language model engineered using Japan’s premier supercomputing technology, the RIKEN Supercomputer Fugaku. Developed through a pioneering collaboration among the Tokyo Institute of Technology, Tohoku University, Fujitsu Limited, RIKEN, Nagoya University, CyberAgent Inc., and Kotoba Technologies Inc., this model represents a significant step forward in natural language understanding.

Model Highlights:

  • Enhanced Japanese Language Proficiency: Achieving an unprecedented average score of 5.5 on the Japanese MT-Bench, with peak performance in humanities and social sciences tasks (9.18 score).
  • Robust Training: Utilizing 13 billion parameters and trained on a massive 380 billion tokens using 13,824 nodes of Fugaku, Fugaku-LLM excels in delivering nuanced language processing.
  • Innovative Technology: Powered by Fugaku’s CPUs and optimized through Megatron-DeepSpeed, this model maximizes computational efficiency, showcasing a six-fold increase in matrix multiplication speeds.

Available on GitHub and Hugging Face, Fugaku-LLM is ready for research and commercial applications and ensures compliance with its open-use license. As of today, the Fujitsu Research Portal also hosts Fugaku-LLM, broadening access to this powerful tool.

The ongoing development and improvement of Fugaku-LLM promise to spur next-generation research and business applications, integrate AI with scientific simulation, and create robust virtual communities.

Connect me at LinkedIn Head of AI, Global Fujitsu Distinguished Engineer & Fujitsu Fellow

#AI #MachineLearning #Supercomputing #Fugaku #LanguageModel #Innovation #TechNews #FugakuLLM


by

Comments

Leave a comment