---
title: "Mana: Dexterous Manipulation of Articulated Tools"
description: "Mana is a coarse-to-fine sim-to-real framework that reframes dexterous articulated tool manipulation as a computer animation problem, using procedurally generated grasp keyframes refined by motion planning and RL to achieve zero-shot real-world transfer across four articulated tools with under one minute of setup per tool."
type: research-paper-digest
arxiv_id: 2606.13677v1
source_url: http://arxiv.org/abs/2606.13677v1
pdf_url: http://arxiv.org/pdf/2606.13677v1
authors: ["Zhao-Heng Yin", "Guanya Shi", "Pieter Abbeel", "C. Karen Liu"]
published: 2026-06-11T17:59:49Z
retrieved: 2026-06-14
has_code: true
canonical: https://flawedquote.com/media/research/arxiv-2606-13677v1.html
review_of: "Mana: Dexterous Manipulation of Articulated Tools"
document_kind: third-party-review
affiliation: none; not affiliated with the paper's authors
provenance: ai-generated-digest; web-grounded; verify against source
---

# Mana: Dexterous Manipulation of Articulated Tools

**This is an AI-generated review/digest of a third-party paper by Zhao-Heng Yin, Guanya Shi, Pieter Abbeel, C. Karen Liu (http://arxiv.org/abs/2606.13677v1) — not the original paper, and not affiliated with its authors. Treat it as a secondary source and verify against the original.**

> Mana is a coarse-to-fine sim-to-real framework that reframes dexterous articulated tool manipulation as a computer animation problem, using procedurally generated grasp keyframes refined by motion planning and RL to achieve zero-shot real-world transfer across four articulated tools with under one minute of setup per tool.

## What this answers

- What is this paper about and what problem does it solve?
- What are the concrete results / benchmark numbers?
- How does the method work?
- Is there an official open-source implementation?
- What are the limitations and what is NOT shown?
- What related/prior work does it build on?
- How do I cite this paper? (BibTeX below)

## Key results

- Zero-shot sim-to-real transfer achieved on four articulated tools spanning different scales and joint types
- Functional affordance specification requires fewer than 1 minute per tool (a few mouse clicks)
- Successful grasping and in-hand manipulation demonstrated on real hardware without any real-world fine-tuning

## Method at a glance

- **Problem:** Articulated tool manipulation requires coordinating internal tool degrees of freedom alongside contact-rich hand-object interactions; prior dexterous manipulation work has focused almost exclusively on rigid objects, leaving articulated tools largely unsolved.
- **Method:** Coarse-to-fine pipeline inspired by computer animation: procedurally generate grasp keyframes from user-specified functional affordances, interpolate with motion planning into dense trajectories, then refine with reinforcement learning in simulation before zero-shot transfer to real hardware.
- **Data:** Four proprietary articulated tools spanning different scales and joint types; no external benchmark dataset named in abstract.
- **Metrics:** Zero-shot sim-to-real transfer success on grasping and in-hand manipulation tasks; setup time per tool (< 1 minute).

## Resources

- **Paper:** http://arxiv.org/abs/2606.13677v1
- **PDF:** http://arxiv.org/pdf/2606.13677v1
- **Code (found):** https://github.com/kristery/dex-affordance

## Limitations

- Only four tools tested; broader generalization across tool categories not yet demonstrated
- Affordance specification, while quick, still requires minimal human input per tool
- No quantitative success-rate numbers reported in the abstract
- Sim-to-real gap for highly compliant or backlash-heavy tools not fully characterized

## Applications

- Surgical robotics (scissors, clamps, needle drivers)
- Industrial assembly with pliers, crimpers, or staplers
- Assistive and prosthetic robotic hands using everyday tools
- Household robotics (tongs, hand pumps, scissors)

## Related work

- [In-Hand Manipulation of Articulated Tools with Dexterous Robot Hands with Sim-to-Real Transfer](https://arxiv.org/abs/2509.23075)
- [SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation](https://arxiv.org/abs/2602.16863)
- [DORA: Object Affordance-Guided Reinforcement Learning for Dexterous Robotic Manipulation](https://arxiv.org/abs/2505.14819)
- [Hierarchical Reinforcement Learning for Articulated Tool Manipulation with Multifingered Hand](https://arxiv.org/abs/2507.06822)

## Citation

```bibtex
@misc{arxiv_2606_13677,
  title={Mana: Dexterous Manipulation of Articulated Tools},
  author={Zhao-Heng Yin and Guanya Shi and Pieter Abbeel and C. Karen Liu},
  year={2026},
  eprint={2606.13677},
  archivePrefix={arXiv},
  url={http://arxiv.org/abs/2606.13677v1}
}
```

## Optional — full explainer

*Everything above is self-contained; skip this if you are context-limited.*

## Summary

Robots with multi-fingered hands can already pick up rigid objects with impressive skill — but hand them a pair of scissors, a stapler, or a pair of pliers and things quickly fall apart. Articulated tools have internal joints that must be coordinated while the hand simultaneously maintains a stable, functional grip. Mana (Manipulation Animator) is a new sim-to-real framework that reframes this hard robotics problem as a computer animation problem: instead of learning every joint trajectory from scratch, it uses procedurally generated grasp keyframes refined through motion planning and reinforcement learning (RL), achieving zero-shot transfer to real robots across four diverse articulated tools.

## How It Works

Mana's core insight is a coarse-to-fine pipeline borrowed from character animation:

  - Affordance specification (≈1 minute per tool): Mana introduces a coarse-to-fine pipeline that transforms procedurally generated grasp keyframes into sophisticated manipulation trajectories, with an automated data generation process requiring only a few mouse clicks to specify functional affordances. This means a human operator simply identifies where on the tool contact should occur — not how to move every finger.

  - Keyframe generation: The system automatically generates coarse grasp keyframes — sparse snapshots of hand-plus-tool configurations that capture the desired functional posture at the start and end of a movement.

  - Motion planning: These keyframes are interpolated into dense, physically consistent trajectories through motion planning, bridging the gap between coarse poses and continuous motion.

  - Reinforcement learning refinement: An RL policy then fine-tunes these trajectories in simulation, learning to handle contact-rich dynamics, tool joint stiffness, and the coupling between hand DOFs and tool DOFs.

The result is a pipeline that is largely automatic and scales without needing per-tool expert demonstrations. The framework is tested on four articulated tools spanning different scales and joint types, and transfers zero-shot to physical hardware without any real-world fine-tuning.

## Why It Matters

Articulated tool use is a long-standing bottleneck in dexterous robotics. Reinforcement learning and sim-to-real transfer have advanced robotic manipulation of rigid objects, yet policies remain brittle when applied to articulated mechanisms due to contact-rich dynamics and under-modeled joint phenomena such as friction, stiction, backlash, and clearances. Mana sidesteps the need for massive human teleoperation data or per-task reward engineering. The animation-inspired framing is a conceptual leap: by treating manipulation as a keyframe interpolation problem, the system gains the procedural scalability that animation pipelines have long enjoyed. The sub-one-minute setup time per tool is particularly significant for downstream deployment.

## Related Work

  - In-Hand Manipulation of Articulated Tools (arXiv 2509.23075): This work addresses dexterous in-hand manipulation of articulated tools using a robotic hand with reduced articulation, proposing a sim-to-real training pipeline with targeted real-world adaptation. It validates across scissors, pliers, and surgical tools, making it the closest concurrent work to Mana.

  - SimToolReal (arXiv 2602.16863): SimToolReal is an object-centric policy for zero-shot dexterous tool manipulation, enabling generalizable robot manipulation of diverse tools through procedural simulation and universal RL policies without task-specific training. It shares Mana's zero-shot transfer goal but focuses on rigid-tool generalization.

  - DORA (arXiv 2505.14819): DORA proposes an object affordance-guided RL framework that uses affordance maps to generate semantically meaningful functional grasp candidates, which then serve as constraints and priors to guide the RL policy. This shares Mana's philosophy of grounding RL in affordance information.

  - DexMV: DexMV builds an imitation pipeline that converts human videos into robot demonstrations through pose estimation, retargeting, and demonstration translation for dexterous manipulation. Mana avoids the need for human video data entirely.

## Implementations

No official open-source GitHub repository for Mana has been found at the time of writing. The paper's arXiv page is available at [arxiv.org/abs/2606.13677](https://arxiv.org/abs/2606.13677). Readers interested in related open implementations may consult the [dex-affordance repo (CoRL 2022)](https://github.com/kristery/dex-affordance) for a related dexterous grasping affordance framework, and [SimToolReal's project page](https://simtoolreal.github.io/) for a comparable zero-shot tool-manipulation pipeline.

## Applications

  - Surgical robotics: Autonomous or teleoperated handling of scissors, needle drivers, and clamps with precisely controlled actuation forces.

  - Industrial automation: Assembly tasks requiring pliers, crimpers, or staple guns — tools that inherently have internal DOFs.

  - Assistive robots & prosthetics: Enabling robotic hands in caregiving or prosthetic contexts to use everyday human tools without custom engineering per device.

  - Household robotics: Cooking and home-repair tools such as tongs, scissors, and hand-operated pumps all share the articulated-joint challenge Mana is designed to solve.

---
*Pre-computed research digest — AI-generated, web-grounded and cited above. Verify against the linked source before relying on it.*
