Presenter Information

Location

https://www.kennesaw.edu/ccse/events/computing-showcase/sp26-cday-program.php

Document Type

Event

Start Date

22-4-2026 4:00 PM

Description

Adapting reinforcement learning policies to changing dynamics is typically addressed by domain randomization, which trains a single robust policy at the cost of specialization, or by meta-RL methods, which enable rapid adaptation but require online inference or optimization. We propose a different mechanism: extending the Platonic Representation Hypothesis (Huh et al., 2024) and vec2vec (Jha et al., 2025) to policy space, we show that diverse task-competent policies trained under varying dynamics admit a shared, low-dimensional manifold structure that is learnable from trajectory embeddings. Platonic Policy Representations (PPR) learns this manifold via geometric preservation losses, then navigates it for rapid adaptation: a hypernetwork generates complete policy weights from any manifold position in a single forward pass, while a dynamics-to-manifold predictor guides deliberate exploration of qualitatively different behavioral strategies for novel dynamics configurations. On continuous control tasks with varying gravity, mass, and friction, PPR achieves 12-235% improvements over domain randomization baselines, with gains scaling with action space dimensionality: Ant (+235%, 8-dim), Walker2d (+153%, 6-dim), Hopper (+59%, 3-dim), and LunarLander (+12%, 2-dim). These results demonstrate that policy adaptation can be recast as geometric navigation of a learned manifold, offering a complementary paradigm to existing adaptation approaches.

Share

COinS
 
Apr 22nd, 4:00 PM

GRP-146-136 Platonic Policy Representations: Navigating Learned Manifolds for Rapid Adaptation

https://www.kennesaw.edu/ccse/events/computing-showcase/sp26-cday-program.php

Adapting reinforcement learning policies to changing dynamics is typically addressed by domain randomization, which trains a single robust policy at the cost of specialization, or by meta-RL methods, which enable rapid adaptation but require online inference or optimization. We propose a different mechanism: extending the Platonic Representation Hypothesis (Huh et al., 2024) and vec2vec (Jha et al., 2025) to policy space, we show that diverse task-competent policies trained under varying dynamics admit a shared, low-dimensional manifold structure that is learnable from trajectory embeddings. Platonic Policy Representations (PPR) learns this manifold via geometric preservation losses, then navigates it for rapid adaptation: a hypernetwork generates complete policy weights from any manifold position in a single forward pass, while a dynamics-to-manifold predictor guides deliberate exploration of qualitatively different behavioral strategies for novel dynamics configurations. On continuous control tasks with varying gravity, mass, and friction, PPR achieves 12-235% improvements over domain randomization baselines, with gains scaling with action space dimensionality: Ant (+235%, 8-dim), Walker2d (+153%, 6-dim), Hopper (+59%, 3-dim), and LunarLander (+12%, 2-dim). These results demonstrate that policy adaptation can be recast as geometric navigation of a learned manifold, offering a complementary paradigm to existing adaptation approaches.