Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking

Yun Liu*,1,2,3,4, Bowen Yang*,1,2, Licheng Zhong*,3, He Wang2,5, Li Yi1,3,4
* denotes equal contribution
1Tsinghua University, 2Galbot, 3Shanghai Qi Zhi Institute, 4Shanghai Artificial Intelligence Laboratory, 5Peking University

Mimicking-Bench is the first benchmark for learning generalizable humanoid-scene interaction skills via mimicking human data, comprising six household interaction tasks. It integrates a diverse human skill reference dataset by leveraging the advances in motion capture dataset and interaction generation network, and constructs a skill-learning paradigm for human-to-humanoid knowledge transfer.

Video

✨ Performances of the Best Combination (Optimization + Improved HST + ACT)

Sitting on a Chair: Successful Cases

Sitting on a Chair: Failure Cases

Sitting on a Sofa: Successful Cases

Sitting on a Sofa: Failure Cases

Lying on a Bed: Successful Cases

Lying on a Bed: Failure Cases

Lying on a Sofa: Successful Cases

Lying on a Sofa: Failure Cases

Touching Points near an Object: Successful Cases

Touching Points near an Object: Failure Cases

Lifting a Box: Successful Cases

Lifting a Box: Failure Cases

Abstract

Learning generic skills for humanoid robots interacting with 3D scenes by mimicking human data is a key research challenge with significant implications for robotics and real-world applications. However, existing methodologies and benchmarks are constrained by the use of small-scale, manually collected demonstrations, lacking the general dataset and benchmark support necessary to explore scene geometry generalization effectively.

To address this gap, we introduce Mimicking-Bench, the first comprehensive benchmark designed for generalizable humanoid-scene interaction learning through mimicking large-scale human animation references.

Mimicking-Bench includes six household full-body humanoid-scene interaction tasks, covering 11K diverse object shapes, along with 20K synthetic and 3K real-world human interaction skill references.

We construct a complete humanoid skill learning pipeline and benchmark approaches for motion retargeting, motion tracking, imitation learning, and their various combinations.

Extensive experiments highlight the value of human mimicking for skill learning, revealing key challenges and research directions.

Contact Us

If you have any questions or suggestions, please contact Yun Liu (yun-liu22@mails.tsinghua.edu.cn), Bowen Yang (ybw22@mails.tsinghua.edu.cn), Licheng Zhong (zlicheng.colmar@outlook.com) or Li Yi (ericyi0124@gmail.com).

BibTeX

@article{liu2024mimicking,
  title={Mimicking-Bench: A Benchmark for Generalizable Humanoid-Scene Interaction Learning via Human Mimicking},
  author={Liu, Yun and Yang, Bowen and Zhong, Licheng and Wang, He and Yi, Li},
  journal={arXiv preprint arXiv:2412.17730},
  year={2024}
}