Hongjie (Tony) Fang 方泓杰

I am a fourth-year Ph.D. student @ Computer Science in Wu Wenjun Honorable Class, Shanghai Jiao Tong University (SJTU) & Shanghai Artificial Intelligence Laboratory, advised by Prof. Cewu Lu. Previously, I got my B. Eng. degree @ Computer Science and Engineering, and B. Ec. degree @ Finance from SJTU in 2022.

My research interests mainly lie on robotics fields, specifically, robotic manipulation (including contact-rich manipulation, dexterous manipulation, long-horizon manipulation, etc.), robot learning (including imitation learning, multimodal learning, data collection methods, in-context learning, reinforcement learning, etc.), and grasping. I am currently a member of SJTU Machine Vision and Intelligence Group (MVIG). My ultimate goal is to enable robots to perform various tasks in the real world under any circumstances, improving the quality of human life.

profile photo

Photo @ İstanbul, Türkiye 🇹🇷
Credit to Jingjing Chen

News
  • Jan. 2026Three papers (HistRISE, DQ-RISE and MBA) are accepted to ICRA 2026. See you in Vienna!
  • Aug. 2025AirExo-2 is accepted by CoRL 2025. See you in Seoul!
  • Jun. 2025Three papers (FoAR, SIME, and KDIL) are accepted by IROS 2025.
  • Apr. 2025FoAR is accepted by RA-L.
  • Mar. 2025AirExo-2 is released! Check our website for more details.
  • Jan. 2025Two papers (S2I and CAGE) are accepted by ICRA 2025.
  • Jun. 2024RISE is accepted by IROS 2024.
  • Jan. 2024Four papers (AirExo, RH20T, Open X-Embodiment and AnyGrasp) are accepted by ICRA 2024.
  • Oct. 2023Open X-Embodiment is released! Proud of this wonderful collaboration in the robotics community.
  • Sept. 2023AirExo is released! Check our website for more details.
  • Apr. 2023AnyGrasp is accepted by T-RO.
  • Jun. 2022TransCG is accepted by RA-L.
Publications

* denotes equal contribution. # denotes corresponding author(s).

Filter by topic:
Multimodal Learning Imitation Learning Contact-Rich Manipulation Force/Torque Generalization
arXiv 2026
Force Policy: Learning Hybrid Force-Position Control Policy under Interaction Frame for Contact-Rich Manipulation

We introduce a physically grounded interaction frame, an instantaneous local basis that separates force regulation from motion, and a method to recover it from demonstrations. Using this, Force Policy combines a global vision policy for free-space motion with a high-frequency local force policy that estimates the interaction frame and executes hybrid force-position control. In real-world contact-rich tasks, it outperforms vision-only and force-aware baselines, improving contact stability, force accuracy, and generalization to novel objects.

Human Video Imitation Learning Generalization
arXiv 2026
LIDEA: Human-to-Robot Imitation Learning via Implicit Feature Distillation and Explicit Geometry Alignment

Introduce LIDEA, an imitation learning framework in which policy learning benefits from human demonstrations. In the 2D visual domain, LIDEA employs a dual-stage transitive distillation pipeline that aligns human and robot representations in a shared latent space. In the 3D geometric domain, we propose an embodiment-agnostic alignment strategy that explicitly decouples embodiment from geometry, ensuring consistent 3D perception.

Imitation Learning Multimodal Learning History/Memory
ICRA 2026
History-Aware Visuomotor Policy Learning via Point Tracking

Propose an object-centric history representation based on point tracking, which abstracts past observations into a compact and structured form that retains only essential task-relevant information. Tracked points are encoded and aggregated at the object level, yielding a compact history representation that can be seamlessly integrated into various visuomotor policies. Our design provides full history-awareness with high computational efficiency, leading to improved overall task performance and decision accuracy. Our history-aware policies consistently outperforms both Markovian baselines and prior history-based approaches.

Imitation Learning Dexterous Manipulation Teleoperation
ICRA 2026
Learning Dexterous Manipulation with Quantized Hand State

Propose DQ-RISE, which quantizes hand states to simplify hand motion prediction while preserving essential patterns, and applies a continuous relaxation that allows arm actions to diffuse jointly with these compact hand states. This design enables the policy to learn arm-hand coordination from data while preventing hand actions from overwhelming the action space. Experiments show that DQ-RISE achieves more balanced and efficient learning, paving the way toward structured and generalizable dexterous manipulation.

Dexterous Grasping Dexterous Manipulation
arXiv 2025
AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-Level Learning Efficiency

Introduce AnyDexGrasp, an efficient approach for learning dexterous grasping with minimal data, advancing robotic manipulation capabilities across different robotic hands. Our results show a grasp success rate of 75-95% across three different robotic hands in real-world cluttered environments with over 150 novel objects, improving to 80-98% with increased training objects.

In-the-Wild Collection Imitation Learning Generalization 3D Perception
CoRL 2025 Oral
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons

Develop AirExo-2, an updated low-cost exoskeleton system for large-scale in-the-wild demonstration collection. By transforming the collected in-the-wild demonstrations into pseudo-robot demonstrations, our system addresses key challenges in utilizing in-the-wild demonstrations for downstream imitation learning in the real world. Propose RISE-2, a generalizable imitation policy that integrates 2D and 3D perceptions, outperforming previous imitation learning policies in both in-domain and out-of-domain tasks, even with limited demonstrations. By leveraging in-the-wild demonstrations collected and transformed by the AirExo-2 system, without the need for additional robot demonstrations, RISE-2 achieves comparable or superior performance to policies trained with teleoperated data, highlighting the potential of AirExo-2 for scalable and generalizable imitation learning.

Imitation Learning Generalization
IROS 2025
Knowledge-Driven Imitation Learning: Enabling Generalization Across Diverse Conditions

We propose knowledge-driven imitation learning, a framework that leverages external structural semantic knowledge to abstract object representations within the same category during imitation learning. We introduce a novel semantic keypoint graph as a knowledge template and develop a coarse-to-fine template-matching algorithm that optimizes both structural consistency and semantic similarity.

Imitation Learning
IROS 2025
SIME: Enhancing Policy Self-Improvement with Modal-Level Exploration

We found that with modal-level exploration, the robot can generate more diverse and multi-modal interaction data. By learning from the most valuable trials and high-quality segments from these interactions, the robot can effectively refine its capabilities through self-improvement.

Imitation Learning Action Generation
ICCV 2025
Dense Policy: Bidirectional Autoregressive Learning of Actions

Propose a bidirectionally expanded learning approach that enhances auto-regressive policies for robotic manipulation. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner with logarithmic-time inference.

Multimodal Learning Contact-Rich Manipulation Force/Torque
RA-L 2025 & IROS 2025
FoAR: Force-Aware Reactive Policy for Contact-Rich Robotic Manipulation

Propose FoAR, a force-aware reactive policy that combines high-frequency force/torque sensing with visual inputs to enhance the performance in contact-rich manipulation. Built upon the RISE policy, FoAR incorporates a multimodal feature fusion mechanism guided by a future contact predictor, enabling dynamic adjustment of force/torque data usage between non-contact and contact phases. Its reactive control strategy also allows FoAR to accomplish contact-rich tasks accurately through simple position control.

Imitation Learning Object Pose Action Generation
RA-L 2025 & ICRA 2026
Motion Before Action: Diffusing Object Motion as Manipulation Condition

Propose MBA, a novel module that employs two cascaded diffusion processes for object motion generation and robot action generation under object motion guidance. Designed as a plug-and-play component, MBA can be flexibly integrated into existing robotic manipulation policies with diffusion action heads. Extensive experiments in both simulated and real-world environments demonstrate that our approach substantially improves the performance of existing policies across a wide range of manipulation tasks.

Imitation Learning Generalization
ICRA 2025
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation

Introduce CAGE, a data-efficient generalizable robotic manipulation policy. With less than 50 demonstrations in the mono-distributed training environment, CAGE can effectively generalize to similar and unseen environments with different levels of distribution shifts (background, object and camera view changes), outperforming previous state-of-the-art policies. This work makes a step forward in developing data-efficient, scalable, and generalizable robotic manipulation policies.

Imitation Learning Data Quality
ICRA 2025
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization

Introduce S2I, a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. With only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations.

Imitation Learning 3D Perception Generalization
IROS 2024
RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

Propose RISE, an end-to-end baseline for real-world robot imitation learning, which predicts continuous actions directly from single-view point clouds. Trained with 50 demonstrations for each real-world task, RISE surpasses currently representative 2D and 3D policies by a large margin, showcasing significant advantages in both accuracy and efficiency.

Manipulation Dataset
ICRA 2024 Best Paper
Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Open X-Embodiment Collaboration, [...], Hongjie Fang, [...] (194 authors)

Introduce the Open X-Embodiment Dataset, the largest robot learning dataset to date with 1M+ real robot trajectories, spanning 22 robot embodiments. Train large, transformer-based policies on the dataset (RT-1-X, RT-2-X) and show that co-training with our diverse dataset substantially improves performance.

In-the-Wild Collection Teleoperation
ICRA 2024
AirExo: Low-Cost Exoskeletons for Learning Whole-Arm Manipulation in the Wild

Develop AirExo, a low-cost, adaptable, and portable dual-arm exoskeleton, for joint-level teleoperation and demonstration collection. Further leverage AirExo for learning with cheap demonstrations in the wild to improve sample efficiency and robustness of the policy.

Manipulation Dataset Teleoperation
ICRA 2024
RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

Collect a dataset comprising over 110k contact-rich robot manipulation sequences across diverse skills, contexts, robots, and camera viewpoints, all collected in the real world. Each sequence in the dataset includes visual, force, audio, and action information, along with a corresponding human demonstration video. Put significant efforts in calibrating all the sensors and ensures a high-quality dataset.

Dynamic Grasping
IROS 2023
Flexible Handover with Real-Time Robust Dynamic Grasp Trajectory Generation

Propose an approach for effective and robust flexible handover, which enables the robot to grasp moving objects with flexible motion trajectories with a high success rate. The key innovation of our approach is the generation of real-time robust grasp trajectories. Designs a future grasp prediction algorithm to enhance the system's adaptability to dynamic handover scenes.

Dynamic Grasping
CVPR 2023
Target-Referenced Reactive Grasping for Dynamic Objects

Focus on semantic consistency instead of temporal smoothness of the predicted grasp poses during reactive grasping. Solve the reactive grasping problem in a target-referenced setting by tracking through generated grasp spaces.

General Grasping
T-RO 2023 & ICRA 2024
AnyGrasp: Robust and Efficient Grasp Perception in Spatial and Temporal Domains

Propose a powerful AnyGrasp model for general grasping, including static scenes and dynamic scenes. AnyGrasp can generate accurate, full-DoF, dense and temporally-smooth grasp poses efficiently, and it works robustly against large depth sensing noise.

Perception Dataset Grasping
RA-L 2022 & ICRA 2023
TransCG: A Large-Scale Real-World Dataset for Transparent Object Depth Completion and a Grasping Baseline

Introduce TransCG, a large-scale real-world dataset for transparent object depth completion, and a lightweight depth completion method DFNet based on the dataset.

General Grasping 3D Perception
ICCV 2021
Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

Propose graspness, a quality based on geometry cues that distinguishes graspable area in cluttered scenes, which can be measured by a look-ahead searching method. Propose a graspness model to approximate the graspness value for quickly detect grasps in practice.

Selected Projects
research project
EasyRobot
Hongjie Fang

Provides an easy and unified interface for robots, grippers, sensors and pedals.

course project of SJTU undergraduate course "Mobile Internet"
Oh-My-Papers

Proposes that we can learn "jargons" like "ResNet" and "YOLO" from academic paper citation information, and such citation information can be regarded as the searching results of the corresponding "jargon". For example, when searching "ResNet", the engine should return the "Deep Residual Learning for Image Recognition", instead of the papers that contains word "ResNet" in their titles, as current scholar search engines commonly return.

Academic Services

Reviewer for Conferences :

  • IEEE International Conference on Robotics and Automation (ICRA), 2023, 2024, 2025, 2026
  • IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2023, 2024, 2025, 2026
  • Conference on Robot Learning (CoRL), 2025, 2026
  • International Conference on Learning Representations (ICLR), 2025
  • Advances in Neural Information Processing Systems (NeurIPS), 2025, 2026
  • IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Reviewer for Journals :
  • IEEE Robotics and Automation Letters (RA-L)
  • IEEE Transactions on Cybernetics (T-CYB)
  • IEEE/ASME Transactions on Mechatronics (T-MECH)
  • IEEE Transactions on Automation Science and Engineering (T-ASE)

Talks

Collaboration & Mentoring

I collaborate closely with Hao-Shu Fang @ MIT, Chenxi Wang @ Noematrix, Shangning Xia @ Noematrix, Lixin Yang @ SJTU, Jun Lv @ Noematrix and Shiquan Wang @ Flexiv . I welcome opportunities for discussions and potential collaborations, and I am particularly interested in working with highly motivated undergraduate and master's students. Please feel free to contact me via email. I'm fortunate to work with the following students:

Course Notes

I share some of my notes in the courses I took at graduate school in this page. More notes during my undergraduate study can be found in this repository.