DeepSeek Innovates AI: New Reasoning and Waiting Method for R2 Model | Best llm models | Best course on large language models reddit | Large language models chatgpt | Turtles AI
DeepSeek, in collaboration with Tsinghua University, has developed an innovative technique that combines Generative Reward Modeling (GRM) and Self-Principled Critique Tuning (SPCT) to improve the reasoning capabilities of large language models (LLMs). This methodology enables LLMs to produce more accurate results that are in line with human preferences.
Key Points:
- Development of a new technique that integrates GRM and SPCT to enhance the reasoning capabilities of LLMs.
- DeepSeek and Tsinghua University collaborated on this innovation.
- DeepSeek-GRM models outperformed existing methods, achieving competitive performance.
- DeepSeek plans to open source the GRM models, without specifying a timeline.
Chinese startup DeepSeek, in collaboration with Tsinghua University, has introduced a new methodology to improve the reasoning capabilities of large language models (LLMs). The technique combines Generative Reward Modeling (GRM) and Self-Principled Critical Tuning (SPCT), as described in a paper published on arXiv. The approach aims to guide LLMs toward more accurate results aligned with human preferences, enabling more precise answers to general queries. The resulting models, known as DeepSeek-GRM, outperformed existing methods, achieving performance competitive with robust public reward models. Reward modeling is a process that guides an LLM toward human preferences. DeepSeek intends to open source these models, although no specific timeframe has been provided. This development comes as speculation intensifies over the upcoming release of the company’s next-generation model, the DeepSeek-R2, the successor to the DeepSeek-R1, which has rocked the global tech community with its performance comparable to leading models, but at significantly lower costs. DeepSeek, founded in 2023 by Liang Wenfeng, is funded by the hedge fund High-Flyer, also founded by Liang. The company recently updated its V3 model, called DeepSeek-V3-0324, improving its reasoning capabilities and Chinese writing proficiency.
It has also open-sourced five of its code repositories, promoting transparency and collaboration in the AI community.