Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization

1Shanghai Jiao Tong University, 2Shanghai Artificial Intelligence Laboratory

With only 3 expert demonstrations for reference, the plug-and-play S2I framework can be used as a data preprocessing step to enhance the performance of various downstream robot manipulation policies when handling mixed-quality demonstrations.

Abstract

Data is crucial for robotic manipulation, as it underpins the development of robotic systems for complex tasks. While high-quality, diverse datasets enhance the performance and adaptability of robotic manipulation policies, collecting extensive expert-level data is resource-intensive. Consequently, many current datasets suffer from quality inconsistencies due to operator variability, highlighting the need for methods to utilize mixed-quality data effectively. To mitigate these issues, we propose "Select Segments to Imitate" (S2I), a framework that selects and optimizes mixed-quality demonstration data at the segment level, while ensuring plug-and-play compatibility with existing robotic manipulation policies. The framework has three components: demonstration segmentation dividing origin data into meaningful segments, segment selection using contrastive learning to find high-quality segments, and trajectory optimization to refine suboptimal segments for better policy learning. We evaluate S2I through comprehensive experiments in simulation and real-world environments across six tasks, demonstrating that with only 3 expert demonstrations for reference, S2I can improve the performance of various downstream policies when trained with mixed-quality demonstrations.

Method



We then propose Select Segments to Imitate (S2I), a framework for processing mixed-quality datasets, which consists of three parts: (a) demonstration segmentation, (b) segment selection, and (c) trajectory optimization. Specifically, S2I first divides the demonstrations into meaningful semantic segments. Then, contrastive learning is applied using minimal expert demonstrations (3 expert demonstrations in experiments) to extract features for segment selection. Finally, the framework performs trajectory optimization on segments identified as suboptimal, refining them to enhance the overall policy learning. In short, the S2I framework produces an optimized and high-quality demonstration dataset that can be directly used for downstream policy training.

Evaluations

For simulation experiments, we evaluate three RoboMimic tasks (Lift, Can and Square) in RoboSuite, and for real-world experiments, we evaluate three tasks (Tissue, Cup and Pen) on a robot platform with a Flexiv Rizon robotic arm (equipped with a Dahuan AG-95 gripper) for control and 2 Intel RealSense D435 cameras for environment perceptions. We employ various downstream policies during evaluations, including BC-RNN, ACT, Diffusion Policy (DP), and RISE. Baselines includes L2D (Oracle), PUBC, ILEED and AWE.

Simulation Results: BC-RNN

Lift-10.

       

Lift-30.

       

Can-30.

       

Can-100.

       

Square-50.

       

Square-150.

       


Simulation Results: Diffusion Policy

Lift-10 (state-based).

       

Lift-10 (image-based).

     

Can-30 (state-based).

       

Can-30 (image-based).

     

Square-50 (state-based).

       

Square-50 (image-based).

     


Real-World Results

Tissue. Take tissue out and place it in the container.


Cup. Collect all the cups (at most 2) into the large metal cup.


Pen. Collect all the pens (at most 3) into the bowl.



BibTeX

@article{
  chen2024towards,
  title = {Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization},
  author = {Chen, Jingjing and Fang, Hongjie and Fang, Hao-Shu and Lu, Cewu},
  journal = {arXiv preprint arXiv:2409.19917},
  year = {2024}
}