Robotic Transformer 2 (RT-2) from Google DeepMind | | | | Turtles AI
Robotic Transformer 2 (RT-2) from Google DeepMind
DukeRem30 July 2023
#Researchers at #Google #DeepMind built RT-2, a vision-language-action model that learns from web and robotics data to produce generalised #robotic actions. It showed improved reasoning, performance in unseen situations, and chain-of-thought long-horizon planning not found in previous robotics models. The researchers argue this highlights the potential to build general-purpose, intelligent robots using vision-language-action tools.
Despite its massive capability to learn from web-scale data and language, general robot behaviour remains narrow and specialized, lacking emergent capabilities. Researchers created RT-2, a novel vision-language-action model that grounds knowledge from web data to physical robotic actions to remedy this. By training on both robotic and web data streams, RT-2 achieved multi-stage reasoning, exceptional performance on seen and unseen scenarios, and chain-of-thought planning to accomplish long-horizon tasks. The researchers argue this demonstrates how vision-language models could transform into vision-language-action tools to build smarter, more general-purpose robots.
Highlight:
• RT-2 learns from both robotic data and web-scale data to produce generalised robotic actions
• It showed up to 3x higher success rates on emergent robotic skills compared to prior baselines
• RT-2 can reason, plan multiple steps ahead, and interpret commands thanks to its web pre-training
• Researchers argue this illustrates the potential to build smarter, more general-purpose physical robots