History-Aware Visuomotor Policy Learning via Point Tracking

History-Aware Visuomotor Policy Learning
via Point Tracking

Jingjing Chen* Hongjie Fang* Chenxi Wang Shiquan Wang† Cewu Lu†

Shanghai Jiao Tong University Noematrix Flexiv Robotics

*Equal Contribution †Equal Advising

Abstract. Many manipulation tasks require memory beyond the current observation, yet most visuomotor policies rely on the Markov assumption and thus struggle with repeated states or long-horizon dependencies. Existing methods attempt to extend observation horizons but remain insufficient for diverse memory requirements. To this end, we propose an object-centric history representation based on point tracking, which abstracts past observations into a compact and structured form that retains only essential task-relevant information. Tracked points are encoded and aggregated at the object level, yielding a compact history representation that can be seamlessly integrated into various visuomotor policies. Our design provides full history-awareness with high computational efficiency, leading to improved overall task performance and decision accuracy. Through extensive evaluations on diverse manipulation tasks, we show that our method addresses multiple facets of memory requirements — such as task stage identification, spatial memorization, and action counting, as well as longer-term demands like continuous and pre-loaded memory — and consistently outperforms both Markovian baselines and prior history-based approaches.

History-Aware Visuomotor Policies

Directly relying on raw observation histories is both inefficient and redundant. Our approach introduces an object-centric point tracking representation that captures the motion and state of task-relevant objects, transforming long sequences of images into a structured form. To make this representation more efficient, we apply a compression module that condenses the history into a compact summary. This compressed history is then integrated into standard visuomotor policies (ACT, Diffusion Policy and RISE), creating history-aware policies that can leverage long-horizon context.

Experiments

We carefully designed 7 real-world manipulation tasks that feature multiple repeated or difficult-to-distinguish states across different horizons, aiming to evaluate both the history-awareness and overall performance of visuomotor policies. The tasks collectively evaluate several aspects of history-awareness, including counting, spatial memorization, task stage identification, pre-loaded memory and continuous memory.

Our object-centric point track history representation can be seamlessly integrated into various visuomotor policies. It enables effective history-aware decision-making, and demonstrates strong effectiveness across all five evaluation aspects.

2D trackers suffer from depth ambiguities and tracking discontinuities, producing low-quality tracks. By contrast, 3D trackers maintain accurate spatial relationships and handle occlusions effectively, yielding substantially better performance than 2D trackers, making them better suited for robotic manipulations.

Asynchronous tracking with train-time augmentation improves efficiency while preserving policy performance. In all experiments and the following videos, HistRISE are implemented with asynchronous tracking, while HistACT and HistDP are implemented with synchronous tracking.

Videos: Add-Salt

RISE v.s. HistRISE

DP v.s. HistDP

LongDP v.s. HistDP

TraceDP v.s. HistDP

ACT v.s. HistACT

❮ ❯

Videos: One-Move

RISE v.s. HistRISE

DP v.s. HistDP

LongDP v.s. HistDP

TraceDP v.s. HistDP

ACT v.s. HistACT

❮ ❯

Videos: Three-Scoop

RISE v.s. HistRISE

DP v.s. HistDP

LongDP v.s. HistDP

TraceDP v.s. HistDP

ACT v.s. HistACT

❮ ❯

Videos: Swap-Easy

RISE v.s. HistRISE

DP v.s. HistDP

LongDP v.s. HistDP

TraceDP v.s. HistDP

ACT v.s. HistACT

❮ ❯

Videos: Swap-Hard

RISE v.s. HistRISE

DP v.s. HistDP

LongDP v.s. HistDP

TraceDP v.s. HistDP

❮ ❯

Videos: Guess-Easy

❮ ❯

Videos: Guess-Hard

❮ ❯

BibTeX

@article{chen2025history,
    title   = {History-Aware Visuomotor Policy Learning via Point Tracking},
    author  = {Chen, Jingjing and Fang, Hongjie and Wang, Chenxi and Wang, Shiquan and Lu, Cewu},
    journal = {arXiv preprint arXiv:2509.17141},
    year    = {2025}
}

Website template: Allan Zhou
Modified from ALOHA @ Tony Z. Zhao