An LLM on Windows 98: AI Returns to Its Origins | Large language models course udemy | Llm meaning software | Build a large language model from scratch pdf | Turtles AI

An LLM on Windows 98: AI Returns to Its Origins
EXO Labs demonstrates a language model running on an old Pentium II, opening up new avenues for AI on limited hardware
Isabella V9 May 2025

 

A team of researchers has demonstrated that an advanced language model can run on a Windows 98, Pentium II computer with 128 MB of RAM, achieving astonishing performance.

Key Points:

  • A LLM based on Llama 2 ran on a 1998 Windows 98 PC, achieving 39.31 tokens per second.
  • The project used a modified version of llama2.c, compiled with Borland C++ 5.02.
  • BitNet, a ternary weighted architecture, allows 7 billion parameter models to run on old hardware.
  • EXO Labs promotes access to AI on old devices by supporting the open source community.


In December 2024, EXO Labs presented an experiment that caught the attention of the technology community: running a large language model on a Windows 98 computer with a 350 MHz Pentium II processor and 128 MB of RAM. Using a modified version of Andrej Karpathy’s llama2.c, called llama98.c, compiled with Borland C++ 5.02, the team managed to run a 260,000-parameter LLM model, achieving a generation rate of 39.31 tokens per second. Even with larger models, such as the 15 million-parameter one, the system maintained acceptable performance, generating 1.03 tokens per second.

This was possible thanks to the adoption of BitNet, a transformer architecture that uses ternary weights (-1, 0, +1), significantly reducing memory and computation requirements. With BitNet, a 7 billion parameter model requires only 1.38 GB of storage, making it compatible with older hardware. Additionally, BitNet is designed to run primarily on CPUs, avoiding the need for expensive GPUs. Tests have shown that a 100 billion parameter model can run on a single CPU at human read speeds, about 5-7 tokens per second.

EXO Labs, founded by researchers at the University of Oxford, aims to democratize access to AI by enabling advanced models to run on a wide range of devices, including those considered obsolete. The organization has also released the source code for llama98.c on GitHub, inviting the community to contribute and experiment. To facilitate collaboration, EXO Labs has created a Discord channel dedicated to discussing running LLM on older hardware, such as old Macs, Gameboys, and Raspberry Pis.

This project highlights how model optimization and the use of efficient architectures can extend AI capabilities to devices with limited resources, promoting more equitable and sustainable access to the technology.