Etched Revolutionizes AI: Sohu, the Specialized Chip for Transformers | ASIC what is it | Gpu vs cpu for Gaming | Hardware Examples | Turtles AI

Etched Revolutionizes AI: Sohu, the Specialized Chip for Transformers
An ASIC chip dedicated to transformers promises unprecedented performance and reduced costs.
DukeRem

A company is poised to revolutionize AI hardware with a specialized chip for transformers, promising unprecedented performance and reduced costs.

Highlights:

  • Sohu is the first ASIC chip dedicated exclusively to transformers.
  • Offers superior performance and reduced costs compared to traditional GPUs.
  • Simplified software optimization due to specialization.
  • Ready for launch with strategic partnerships and significant pre-orders already placed.

 

Etched has taken a bold bet, investing in the development of Sohu, the world’s first ASIC (Application-Specific Integrated Circuit) chip dedicated exclusively to transformer models, such as those used by ChatGPT. This innovative approach could radically change the AI landscape, offering unmatched performance and significantly lower costs compared to traditional GPUs.

In 2022, Etched began developing Sohu, a chip designed to fully exploit the transformer architecture. While traditional GPU chips, like those from NVIDIA, are designed to be flexible and support a wide range of AI models, Sohu is specifically optimized for transformers. This specialization allows Sohu to offer superior performance, with a processing capacity of over 500,000 tokens per second for the Llama 70B model, making it significantly faster and cheaper than next-generation GPUs.

AI models based on transformers, such as ChatGPT, Sora, and Gemini, are currently the most advanced in the field of artificial intelligence. These models require enormous computational resources for training and inference, and Etched’s approach aims to address this issue. Sohu cannot run traditional AI models like DLRMs for Instagram ads or protein-folding models like AlphaFold 2, but it offers unbeatable performance for transformers.

Etched’s strategy is based on the principle that scale is crucial for AI. Over the past five years, AI models have become smarter thanks to the use of increasingly larger amounts of computing power. Meta, for example, used 50,000 times more computing power to train Llama 400B than OpenAI did for GPT-2. This increase in scale has led to significant improvements in AI model performance.

However, expanding the infrastructure needed to support these models is extremely costly. Future data centers will require investments exceeding the GDP of some nations. Currently, the issue is not the availability of data but the computational capacity needed to handle it effectively. GPUs are reaching their limits, with performance improvements mainly achieved by increasing chip sizes rather than enhancing efficiency.

Etched has recognized this limitation and focused on specialization. Specialized chips are inevitable in a landscape where AI model architecture tends to converge toward transformers. With training costs exceeding a billion dollars and inference costs reaching ten billion, a 1% improvement justifies the development of custom chips like Sohu.

A comparison of Sohu’s performance with traditional GPUs clearly shows the advantages of the specialized chip. A server with eight Sohu chips can replace 160 H100 GPUs, offering superior throughput with more efficient resource utilization. The technical details of the Sohu chip highlight how it is possible to achieve a high density of mathematical operations by removing unnecessary control logic, reaching FLOPS utilization rates over 90%, compared to around 30% for GPUs.

Memory and bandwidth are often considered bottlenecks for AI model inference. However, for modern models like Llama-3, inference is limited by computation rather than memory bandwidth. With techniques such as continuous batching, it is possible to maximize computational efficiency, allowing Sohu to execute enormous throughputs without being limited by memory bandwidth.

From a software perspective, Sohu benefits from its specialization. While GPUs and TPUs must handle complex and flexible code, Sohu focuses exclusively on transformers, significantly simplifying software development. Transformer-specific inference libraries, like TensorRT-LLM and HuggingFace’s TGI, can be easily optimized for Sohu, providing further performance improvements.

Etched is confident that Sohu represents the future of AI hardware. With strategic partnerships with TSMC for the 4nm process and assured supplies of HBM and servers, the company is poised for one of the fastest chip launches in history. Top AI researchers and hardware engineers have left major AI chip projects to join Etched, and early customers have already reserved tens of millions of dollars worth of Sohu hardware.

The potential impact of Sohu on the AI field is immense. Faster and cheaper AI models could revolutionize video processing, AI chat services, and many other sectors, leading to drastic improvements in performance and cost reduction. Sohu’s specialization in transformers marks a significant breakthrough in AI hardware, promising to overcome the limitations of traditional GPUs and opening up new possibilities for the future of AI.