DeepMind Improves AI Training With New Distributed Method | Llm machine learning tutorial github | Large language models coursera | Large language models tutorial python github | Turtles AI

DeepMind Improves AI Training With New Distributed Method
Streaming DiLoCo dramatically reduces bandwidth requirements, paving the way for more affordable and efficient models
Isabella V12 February 2025

 

DeepMind introduces a new methodology for distributed training of large-scale AI models, optimizing communication between separate clusters. Streaming DiLoCo could dramatically reduce costs and resources required, challenging the current paradigm of centralized supercomputers.

Key Points:

  • DeepMind’s new strategy for distributed training of AI models, reducing the dependency on centralized clusters.
  • Improved communication efficiency with Streaming DiLoCo, decreasing bandwidth requirements by up to 400 times.
  • Large-scale application, with the potential to democratize access to advanced AI training.
  • Still open engineering challenges, with further studies needed to optimize the scalability and effectiveness of the method.

In the AI ​​landscape, training large-scale language models has always been a highly expensive activity, requiring massive infrastructures and computing power that is difficult to access outside of large technology corporations. DeepMind, Google’s AI subsidiary, recently presented an alternative that could improve the sector: a distributed training technique called Streaming DiLoCo. Based on the DiLoCo (Distributed Low-Communication Training) method, this innovation aims to optimize the training of AI models across clusters of non-centralized computers, drastically reducing the need for high-speed communication between nodes and improving the efficiency of the process.

The idea behind this approach arises from the need to overcome the critical issues related to the traditional architecture of LLMs (Large Language Models), which requires huge amounts of GPU accelerators, advanced network infrastructures and sophisticated cooling systems. The cost of maintaining such systems is prohibitive and their scalability is limited by the engineering difficulties related to synchronization between devices. DeepMind’s goal is therefore to free the training of AI models from the obligation of colocalization, allowing the computational load to be distributed across more distant and less interconnected clusters without impacting the quality of training.

Streaming DiLoCo introduces three key innovations: selective parameter synchronization, avoiding simultaneous updates of all model variables; overlapping processing and communication times, which allows devices to continue working without waiting for data transmission; and advanced quantization of gradients, which reduces the volume of information exchanged by using a precision of four bits per parameter. Thanks to these optimizations, the researchers claim to have obtained results comparable to traditional methods, but with a reduction in network traffic of up to 400 times less.

The technological context in which this research is inserted is in turmoil. The growing need to train ever larger models has pushed companies such as Nvidia to develop technologies to connect separate data centers, thus creating virtual infrastructures on an even larger scale. However, DeepMind’s paradigm suggests an alternative: instead of increasing the complexity of AI supercomputers, it would be possible to reduce the need for ultra-fast connections and redistribute training more efficiently. This prospect has sparked strong interest in the scientific and industrial community, with the experimental adoption of the model by companies such as Prime Intellect, which used OpenDiLoCo (an open-source version of the method) to train its 10 billion parameter model.

Despite the promise of this innovation, DeepMind researchers emphasize that this is only a first step. Integrating techniques from federated learning and optimizing the approach for different hardware architectures are areas of research yet to be explored. Furthermore, it remains to be understood how to scale the number of DiLoCo replicas efficiently with respect to a fixed computational budget.

The evolution of distributed training could redefine the future of AI, making the field less dependent on giant data centers and democratizing access to advanced models.