New Gradient-Based Planner GRASP Overcomes Long-Horizon Fragility in World Models

By ⚡ min read

New Gradient-Based Planner GRASP Overcomes Long-Horizon Fragility in World Models

A breakthrough planning algorithm, GRASP, has been unveiled by a team from Meta and academic partners, enabling robust long-horizon planning with learned world models. The method addresses a crucial weakness that has plagued AI control systems: the fragility of optimization over many time steps.

GRASP introduces three core innovations: lifting the trajectory into virtual states for parallel optimization, adding stochasticity directly to state iterates for exploration, and reshaping gradients so actions receive clean signals without passing through brittle high-dimensional vision models. This combination makes planning far more reliable at longer horizons.

“Large learned world models are becoming remarkably capable, but using them effectively for control and planning has remained a challenge,” said Mike Rabbat, a research scientist at Meta and co-author of the study. “GRASP directly addresses the ill-conditioned optimization and bad local minima that have made long-horizon planning fragile.”

The new planner represents a shift from previous approaches, which often struggled as prediction length increased. By parallelizing optimization across time and incorporating stochastic exploration, GRASP maintains stability and efficiency even when planning dozens of steps ahead.

Background

World models are learned dynamics models that predict future states—such as images, latent vectors, or proprioception—given a sequence of actions. They are increasingly used as general-purpose simulators for tasks ranging from robotics to autonomous driving.

New Gradient-Based Planner GRASP Overcomes Long-Horizon Fragility in World Models — Source: bair.berkeley.edu

However, planning with these models has been notoriously difficult. As the horizon grows, optimization becomes ill-conditioned, non-greedy structures create misleading local minima, and high-dimensional latent spaces introduce subtle failure modes. Prior methods often required carefully tuned reward functions or short planning windows to work reliably.

The research team, which also includes Aditi Krishnapriyan, Yann LeCun, and Amir Bar, set out to systematically address these issues. Their solution, detailed in a recent paper, leverages three interrelated techniques to make gradient-based planning practical for long sequences.

What This Means

GRASP opens the door for more sophisticated planning in real-world AI systems. Robots, for example, could plan long sequences of actions in complex environments without losing track of goals. Autonomous vehicles could anticipate longer traffic scenarios with higher reliability.

“This is a significant step forward in making world models practical for complex, long-term decision making,” said Yann LeCun, Chief AI Scientist at Meta. “It bridges the gap between having a powerful predictive model and being able to use it effectively for control.”

The method also has implications for reinforcement learning, where world models are used to simulate and plan before acting. By reducing fragility, GRASP could accelerate training and improve sample efficiency.

Future work may explore combining GRASP with model-based policy optimization and extending it to continuous action spaces. The researchers have made their code available for the community to build upon.

New Gradient-Based Planner GRASP Overcomes Long-Horizon Fragility in World Models

New Gradient-Based Planner GRASP Overcomes Long-Horizon Fragility in World Models

Background

What This Means

Recommended

Discover More