Brain-Inspired AI Reveals How Babies and Robots Learn | Is chatgpt a llm or generative ai | Risks of using chatgpt in business | Hands-on large language models pdf github | Turtles AI
A new embodied AI model developed at the Okinawa Institute of Science and Technology offers innovative insights into generalization and compositionality learning by simulating children’s cognitive processes. Based on a PV-RNN neural network, it integrates language, vision, and proprioception to process information more transparently and interpretably than modern large-scale language models.
Key Points:
- Embodied learning: The model learns by combining language, visual perception and proprioception.
- Compositionality: Develops the ability to distinguish general concepts from specific contexts, such as children.
- Greater transparency: Allows to observe the internal processes of the neural network.
- Efficiency and safety: Requires less data and calculation, with errors similar to those of humans.
Researchers at the Okinawa Institute of Science and Technology have designed an innovative embodied AI architecture that sheds new light on the mechanisms of cognitive learning and how children develop the ability to generalize. This model, based on a neural network called PV-RNN (Predictive coding inspired, Variational Recurrent Neural Network), stands out for its ability to simultaneously integrate information from multiple sensory modalities, replicating human cognitive constraints and improving the understanding of decision-making processes in neural networks.
The approach followed is inspired by the principle of free energy, according to which the human brain continuously builds predictions on sensory inputs based on experience, minimizing discrepancies between expectation and reality to maintain a stable cognitive state. The model developed differs from current Large Language Models (LLMs), which are based on the statistical analysis of huge textual datasets: the PV-RNN, on the contrary, learns through an embodied interaction with the world, simultaneously processing images of a robotic arm moving colored blocks, proprioception data and linguistic instructions such as "place red on blue". The output produced consists of both a visual and motor prediction in response to a verbal command or a sensory stimulus.
This architecture allows us to explore in depth how an AI system develops compositionality, or the ability to break down and recombine concepts to generalize new information. Tests have shown that learning is more effective when the same term is presented in different contexts, a principle that closely mirrors the way children acquire language: exposure to objects of the same color in different scenarios accelerates the understanding of the color category compared to isolated repetitions of the same stimulus. Furthermore, compared to LLMs, PV-RNN achieves a high level of generalization with significantly less data and reduced computational power. Although this model makes more errors, it produces errors similar to those of humans, making it particularly useful for those studying cognitive development and for AI researchers interested in better understanding the decision-making processes of neural networks.
Another important aspect concerns the problem of poverty of stimuli, a central issue in developmental linguistics: children, despite receiving limited linguistic input, quickly acquire the ability to express themselves. The PV-RNN model suggests that the rooting of language in motor and perceptual behavior may be a determining factor in this process, reducing the need for a large data set while maintaining a high learning capacity. This feature opens the way to a possible application of this approach for the design of safer, more transparent and ethical AI, with a deeper understanding of the meaning of words based on experience rather than purely statistical analysis. A system that learns the concept of “suffering” not only through reading, but through embodied experiences, could develop a greater awareness of the implications of its actions.
OIST researchers now plan to expand and improve the model to explore new areas related to developmental neuroscience, with the aim of further investigating the dynamics of cognitive and linguistic learning.
The PV-RNN architecture not only represents a step forward in AI research, but also offers valuable insights into how the human mind organizes and processes information to build its knowledge of the world.