Gemini di Google conquers Pokémon Blu: the Ai to the test of the classics | Why PC gaming is worse than console | 1000 free games to play | | Turtles AI
Gemini 2.5 Pro of Google completed Blu Pokémon, supported by a system of agents. At the same time, Claude 3.7 Sonnet di Anthropic faced Pokémon Rosso, highlighting the challenges of the AI in classical games and his reasoning skills.
Key points:
- Gemini 2.5 Pro completes Pokémon Blu: the Google AI model, assisted by an independent engineer, finished the 1996 game.
- Claude 3.7 Sonnet faces Pokémon Rosso: Anthropic’s model has achieved three medals, showing significant progress compared to previous versions.
- Challenges in the gameplay for AI: both models have encountered difficulties in specific areas, highlighting current limits in decision -making autonomy.
- Streaming on Twitch involves the public: live sessions have allowed spectators to observe the decisions and reasoning of the AI in real time.
In the Panorama of the AI, two top models recently attracted attention to their videogame companies: Gemini 2.5 Pro of Google and Claude 3.7 Sonnet by Anthropic. Gemini 2.5 Pro, although not directly developed for gaming, completed Pokémon Blu thanks to the Joel Z initiative, an independent computer engineer. Using a system of agentic "harnesses", Gemini received visual and contextual inputs to navigate the game, with minimal interventions from the developer, as indications on known bugs. Google CEO, Sundar Pichai, celebrated the milestone on X, underlining the importance of the event.
At the same time, Anthropic tested Claude 3.7 Sonnet’s skills on Pokémon Rosso, broadcasting the Twitch sessions. The model showed progress compared to previous versions, obtaining three medals and exceeding initial challenges. However, he encountered significant obstacles, such as the difficulty in overcoming Monte Luna, where he was blocked for over 80 hours, and problems in recognizing key characters such as Professor Oak. These difficulties highlight the current challenges in autonomy and long -term memory of the AI.
Claude’s game sessions actively involved the public, who was able to observe the decision -making process of the model in real time, thanks to the display of the "Vision Scratchpad". This approach has allowed a greater transparency in the functioning of the AI, but has also highlighted its limitations, such as the tendency to be trapped in decision -making loops or to perform repetitive actions without making in the game.
The use of classic video games such as benchmark for AI offers a unique perspective on the current skills and limits of the models. While Gemini has shown a remarkable adaptation capacity with external support, Claude has highlighted the importance of an effective memory and strategic planning to face complex tasks. Both cases underline the need for further developments to achieve true decision -making autonomy in the AI.
These experiences represent significant steps in exploring the potential of AI, offering precious ideas for future applications and improvements in existing models.