Apple Uses Google’s TPU Chips for AI Training | What is generative ai vs ai examples | Google generative ai course free | Generative ai course free with certificate | Turtles AI
Highlights:
- Apple uses Google’s TPU chips for AI training instead of NVIDIA GPUs.
- The Apple Foundation Model (AFM) was trained on 8192 TPUv4 and 2048 TPUv5p chips.
- The server-side AFM model has a low rate of harmful content violations compared to OpenAI’s GPT-4.
- High user satisfaction rates for email, message, and notification summarization with the on-device model.
Apple uses Google’s TPU chips instead of NVIDIA GPUs for AI training, confirming a strategic and innovative choice. A technical approach that may surprise industry observers.
Apple has taken a different path for training its AI models, opting to use Google’s Tensor Processing Unit (TPU) chips instead of NVIDIA’s GPUs, a market leader. This decision is detailed in a recent research paper that explores the AI training capabilities for products such as the iPhone and other innovations unveiled this year. The primary model, named the Apple Foundation Model (AFM), contains 2.73 billion parameters and leverages Google’s TPUv4 and TPUv5p clusters.
The paper reveals that the AFM, which supports cloud features called Apple Cloud Compute, was trained using 8192 TPUv4 chips, organized into pods of 4096 units each. These chips are part of Google’s cloud infrastructure, known for its high-performance computing capabilities. For on-device models, such as those used for writing and image selection, Apple employed a model with 6.4 billion parameters, trained from scratch with a similar setup to the server-side AFM. For these on-device models, Apple used 2048 TPUv5p chips, an advanced version over the TPUv4.
The technical details in the paper also highlight how Apple addressed the evaluation of models in terms of harmful responses, factual correctness, and user satisfaction. The results are impressive: the server-side AFM model recorded a harmful content violation rate of 6.3%, significantly lower than OpenAI’s GPT-4, which stands at 28.8%. Similarly, the on-device model showed a violation rate of 7.5%, compared to 21.8% for Meta’s Llama-3-8B.
In the analysis of performance for email, message, and notification summarization, the on-device AFM model achieved user satisfaction rates of 71.3%, 63%, and 74.9%, respectively. These results surpass those of other models such as Llama, Gemma, and Phi-3, confirming the effectiveness of Apple’s model in these applications.
This strategic move by Apple highlights a preference for the specific computational capabilities of TPUs over GPUs, emphasizing the importance of diversified hardware choices to optimize efficiency and performance for their AI models. The selection of Google’s TPUs, known for their ability to process large amounts of data with high energy efficiency, represents a carefully considered technical decision, especially in a market where NVIDIA’s GPUs traditionally dominate. This choice may reflect Apple’s desire to diversify its technology suppliers, as well as to optimize costs and performance for the specific needs of their AI products and services.