Meet RedPajama, a project for open source LLMs | | | | Turtles AI
Meet RedPajama, a project for open source LLMs
DukeRem23 April 2023
Foundation models such as #GPT-4 have revolutionized the #AI landscape. Yet, most of these cutting-edge tools are confined within commercial models or are only partially open. Today, we're thrilled to announce a groundbreaking initiative to bridge this gap: #RedPajama, a project designed to create a suite of fully open-source models. The project's first milestone – reproducing #LLaMA's training dataset containing over 1.2 trillion #tokens – has been successfully completed.
As commercial #APIs continue to restrict research, customization, and usage with sensitive data, the AI community is witnessing a shift toward open-source projects. This movement, reminiscent of Linux's impact in the software world, has paved the way for semi-open models like LLaMA, #Alpaca, #Vicuna, and #Koala, as well as fully open models such as Pythia, OpenChatKit, Open Assistant, and Dolly. RedPajama aspires to solidify the open-source movement and foster innovation by offering a leading, fully open-source language model.
Together, Ontocord.ai, ETH DS3Lab, Stanford CRFM, Hazy Research, and MILA Québec AI Institute are collaborating on this ambitious endeavour. RedPajama comprises three key components:
- High-quality pre-training data with broad coverage
- Base models trained at scale on this data
- Instruction tuning data and models to enhance usability and safety