New announcements by Meta | | | | Turtles AI

New announcements by Meta
DukeRem
 
Meta AI, formerly known as Facebook, has announced two major advancements toward the development of general-purpose embodied AI agents capable of performing challenging sensorimotor skills. The first advancement is an artificial visual cortex called VC-1, which supports a diverse range of sensorimotor skills, environments, and embodiments. VC-1 is trained on videos of people performing everyday tasks from the groundbreaking Ego4D dataset created by Meta AI and academic partners. A second advancement is a new approach called adaptive (sensorimotor) skill coordination (ASC), which achieves near-perfect performance (98 per cent success) on the challenging task of robotic mobile manipulation in physical environments. According to optimistic science fiction, a future is imagined where humans create art and pursue fulfilling pastimes while AI-enabled robots handle dull or dangerous tasks. However, the AI systems of today display increasingly sophisticated generative abilities on ostensible creative tasks, while robots capable of sensorimotor skills are (still) lagging behind. This gap is known as Moravec’s paradox, which suggests that the hardest problems in AI involve sensorimotor skills, not abstract thought or reasoning. The development of these breakthroughs was made possible by data, which AI needs to learn from. Specifically, embodied AI requires data that captures interactions with the environment. Traditionally, this interaction data is collected by either collecting large amounts of demonstrations or allowing the robot to learn from interactions from scratch. Both approaches are too resource-intensive to scale towards the learning of a general embodied AI agent. In both of these works, new ways for robots to learn using videos of human interactions with the real world and simulated interactions within photorealistic simulated worlds are being developed. The VC-1 module in an AI system enables an artificial agent to convert camera input into actions, similar to the region of the brain that (together with the motor cortex) enables an organism to convert vision into movement. Meta AI's FAIR team, together with academic collaborators, has been at the forefront of developing general-purpose visual representations for embodied AI trained from egocentric video datasets. The Ego4D dataset has been especially useful, containing thousands of hours of wearable camera video from research participants around the world performing daily life activities, including cooking, cleaning, sports, and crafts. The development of CortexBench, consisting of 17 different sensorimotor tasks in simulation, spanning locomotion, navigation, and dexterous and mobile manipulation, has allowed for a rigorous and consistent evaluation of existing and new pre-trained models. The lack of consistency meant there was no way of knowing which of the existing pre-trained visual representations were best. As a first step, Meta AI's FAIR team curated CortexBench, implementing the community standard for learning the policy for each task. This allowed for the development of one model and/or algorithm that achieves competitive performance on all tasks, similar to the general-purpose visual cortex in biological organisms. The research community has been trending towards pretraining visual representations from web images and egocentric videos, and Meta AI's recent breakthroughs suggest that general-purpose embodied AI agents may soon become a reality.