OpenAI Innovates AI Personalization with Reinforcement Fine-Tuning (RFT) | OpenAI Playground | OpenAI ChatGPT | Chat OpenAI | Turtles AI

OpenAI Innovates AI Personalization with Reinforcement Fine-Tuning (RFT)
OpenAI introduces advanced technique that boosts reasoning of domain-specific models
Isabella V10 December 2024

 

OpenAI has introduced Reinforcement Fine-Tuning (RFT), a technique that uses reinforcement learning to improve the reasoning ability of domain-specific AI models. This approach promises advanced applications in critical areas such as healthcare, law, and finance, offering significant advantages over traditional fine-tuning.

Key Points:

  • What is RFT: An innovative methodology that uses reinforcement learning to train AI models with limited data and a focus on reasoning.
  • Industry Applications: Demonstrated improvements in genetic disease prediction and forensic support.
  • Key Differences: RFT outperforms classical fine-tuning, offering computational efficiency and accuracy with few examples.
  • Access and Cost: RFT is currently in limited preview, with the full version of ChatGPT Pro o1 priced at $200 per month.

OpenAI has revealed a breakthrough in AI personalization with the introduction of Reinforcement Fine-Tuning (RFT), an advanced technique presented on the second day of the "12 Days of OpenAI" streaming series. The innovation aims to enhance the reasoning power of o1 models, making them specialists in complex domains with efficient use of computational resources and data. Unlike traditional fine-tuning, which adjusts model parameters based on supervised labels, RFT uses reinforcement learning, allowing the model to learn from feedback on its performance and continuously improve. This methodology, according to OpenAI’s Mark Chen, elevates models from advanced academic level to expert expertise in specific domains.

The RFT approach has proven to be particularly effective in high-risk domains, as evidenced by initial experiments. One notable example is an o1-mini model trained to predict genetic diseases: thanks to RFT, it outperformed its baseline version. Similar applications have found their way into the legal industry, where partnerships such as the one with Thomson Reuters have highlighted the effectiveness of refined models in interpreting and analyzing complex texts. Justin Reese, a computational biologist, has highlighted the transformative potential of this technology in the healthcare field, especially for rare diseases.

One of the key innovations introduced by RFT is the ability to obtain significant improvements with a minimal amount of data: often 12 examples are enough to refine the model. This efficiency comes from the feedback-based approach, which allows the model to explore different solutions and learn from the results obtained, rather than following rigid labels. However, as OpenAI engineer John Allard explains, RFT has limitations in subjective or creative fields, where consensus on the results is more difficult to define.

The availability of this new technology is currently limited to an alpha program reserved for selected organizations. These partners are testing the capabilities of RFT in real-world environments, using OpenAI’s internal tools to customize the models to their needs. The goal is to gather feedback to further improve the methodology before the public launch in 2025.

RFT’s announcement comes amid a broader rollout of updates. On the first day of “12 Days of OpenAI,” the company introduced the full version of the o1 model, available in the Plus plan for $20 per month, and the new Pro plan, which includes the o1 Pro model for $200 per month. The latter promises optimized answers to complex problems, as well as advanced developer features like structured outputs and image understanding via APIs.

Reaction to the Pro plan’s pricing has been mixed: many users consider the price excessive compared to the benefits, while others, like Wharton School professor Ethan Mollick, have praised the o1 Pro model’s capabilities in specific areas, while acknowledging that it doesn’t excel universally. Benchmark tests have shown that o1 Pro outperforms competitors like Claude Sonnet 3.5 in complex reasoning, but not in code generation, where Sonnet stands out for its simplicity and quality.

OpenAI’s strategy with RFT represents a decisive step towards creating hyper-specialized AI models capable of tackling complex challenges with unprecedented precision and versatility.

Video