Google Cloud Powers Infrastructure with Ironwood TPU and AI Hypercomputer | Types of hardware | List of hardware components | Gpu vs cpu for gaming | Turtles AI

Google Cloud Powers Infrastructure with Ironwood TPU and AI Hypercomputer
Google Cloud introduces 7th-generation Ironwood TPU and advancements in hardware, networking and software to optimize performance and efficiency for AI workloads, offering scalable, integrated solutions for training
Isabella V10 April 2025

 

AI infrastructure evolution accelerates with Google Cloud’s AI Hypercomputer, an advanced system for optimizing the efficiency and scalability of AI workloads. With the introduction of the 7th generation Ironwood TPU and enhancements in networking, storage, and software, it redefines supercomputing for AI.

Key Points:

  • Ironwood TPU: Improved performance for inference with improved energy efficiency.
  • Hardware Expansion: New NVIDIA VMs and Networking Advancements.
  • Optimized Storage: Exapool Hyperdisk and Cloud Storage Anywhere Cache Solutions.
  • Software Innovations: Cloud Pathways and AI Inference Optimizations.


Google Cloud’s AI ecosystem takes another step forward with AI Hypercomputer, a supercomputing platform designed to seamlessly and affordably handle AI workloads. It integrates hardware and software for high performance and low cost of ownership, making computational power available flexibly and efficiently. At the heart of this evolution is Ironwood, the seventh-generation TPU, which offers five times the processing capacity of the previous generation Trillium, with six times the HBM memory and double the energy efficiency. Ironwood is available in configurations of up to 9,216 chips for a total of 42.5 exaFLOPS, providing a significant increase in performance for inference operations.

At the same time, Google Cloud is expanding its offering with A4 and A4X VMs, based on NVIDIA B200 and GB200 NVL72 GPUs, thus improving the compute capacity for machine learning applications. The network infrastructure is enhanced with 400G Cloud Interconnect and Cross-Cloud Interconnect, which guarantee four times the bandwidth of previous solutions, reducing latency in connections between on-premises and cloud environments. Storage evolves with Exapool Hyperdisk, a block system with exabyte-scale capacity and multi-TB/s throughput, as well as Rapid Storage, which optimizes data access for AI accelerators, improving loading speeds up to 20x compared to traditional regional buckets.

The AI ​​Hypercomputer software layer plays a critical role in resource integration and management, with Pathways on Cloud making the distributed runtime developed by DeepMind available on Google Cloud. This system provides scalable inference with latency reduction and dynamic management of the computational load. The platform also introduces Cluster Director for GKE and Slurm, streamlining accelerator management with targeted workload placement and advanced monitoring systems to ensure stability and business continuity. New observability and failure prevention capabilities ensure smooth execution even in the presence of degraded nodes.

AI inference benefits from the implementation of GKE Inference Gateway and GKE Inference Quickstart, tools to simplify deployment and workload balancing, reducing service costs by 30% and improving productivity by 40%. At the same time, vLLM, a high-performance library for inference, becomes natively compatible with TPU, enabling seamless integration without structural changes to existing frameworks. The Dynamic Workload Scheduler (DWS) expands accelerator support to include TPU v5e, Trillium, and NVIDIA A3 Ultra and A4 GPUs, enabling more flexible and dynamic management of computational resources.

Google Cloud’s AI Hypercomputer infrastructure marks a milestone in the evolution of AI at scale, with integrated solutions that optimize efficiency and scalability for the future of advanced computing.