DeepMind proposes a new algorithm for reinforcement learning, teaching the intelligent body from zero control

SAC-X is a groundbreaking approach that enables robotic arms to learn how to pick up and place objects from the ground in real-world settings. The core idea behind SAC-X is that an agent should first acquire a set of fundamental skills before tackling more complex tasks. Just like a baby must develop coordination and balance before learning to crawl or walk, SAC-X introduces intrinsic goals that correspond to simple skills, helping the agent gradually build up to more advanced actions.

Researchers believe that SAC-X is not just a robotics-specific method but a versatile reinforcement learning framework with potential applications across various domains. Whether it's children or adults organizing their space, they often resist instructions. Training AI agents to perform structured tasks is even more challenging. To succeed, agents must master essential visual-motor skills such as approaching an object, grasping it, opening a container, and placing it insideâ€”each step needing to be executed in the correct sequence.

For tasks like tidying a desk or stacking objects, the agent must coordinate its simulated arms and nine finger joints, considering the three Ws: how, when, and where. This complexity leads to an enormous number of possible movement combinations, making it a significant challenge in reinforcement learning research.

Traditional techniques like reward shaping, apprenticeship learning, or learning from demonstrations can help, but they rely heavily on prior task knowledge. Learning complex control problems from scratch with minimal pre-knowledge remains a major challenge. To address this, we introduced Scheduled Auxiliary Control (SAC-X), a new learning paradigm designed to overcome these limitations.

In our experiments, SAC-X was tested in both simulated and real-world robot tasks, such as stacking different objects and placing them into boxes. The auxiliary tasks were designed to encourage agents to explore their sensory environment, such as activating tactile sensors on fingers, sensing wrist strength, maximizing joint angles, and ensuring objects stay within the visual field.

Each task provides a simple reward if the goal is achieved, otherwise, no reward is given. This setup helps the agent gradually learn by trial and error. The agent starts by activating the tactile sensor and then moves the object. Eventually, it learns to make decisions about its next actionâ€”whether itâ€™s a secondary task or an external target. Crucially, SAC-X allows the agent to detect and learn from rare external rewards without relying on playback-based off-policy methods.

By accumulating indirect knowledge, the agent builds a personalized learning path. This method is especially efficient in environments with sparse rewards. The scheduling module determines the next intent, and using meta-learning, the scheduler improves during training, aiming to maximize progress on the main task and significantly boosting data efficiency.

After exploring various internal auxiliary tasks, the agent learns to stack and organize items effectively. Evaluation showed that SAC-X could complete all assigned tasks from scratch, using the same set of auxiliary tasks. Most impressively, our labâ€™s robotic arm successfully learned to pick and place objects without prior exposureâ€”something previously considered difficult due to the need for efficient data in real-world learning.

Traditionally, real-world robotic learning involved pre-training in simulation before transferring to physical systems. With SAC-X, however, the robotic arm can now learn directly in the real world, achieving tasks like lifting and moving green cubes that it had never encountered before.

We believe SAC-X represents a significant step toward true zero-shot learning, where only the overall goal needs to be defined. SAC-X allows for flexible definition of auxiliary tasks based on general perceptions, ultimately covering a wide range of important tasks. In this sense, SAC-X is a general-purpose reinforcement learning method applicable beyond robotics, especially in environments with sparse rewards.

The Future Intelligence Lab is a collaborative research institute focused on artificial intelligence, internet, and brain science. It brings together AI scientists and institutions to explore cutting-edge technologies and applications.

Its primary goals include developing an AI IQ evaluation system, conducting global AI IQ assessments, and creating an Internet (city) cloud brain initiative. Through this, the lab aims to enhance enterprise, industry, and city intelligence through advanced technological solutions.

2 In 1 USB Charging Cables

Dongguan Pinji Electronic Technology Limited , https://www.iquaxusb4cable.com