menily/toolkit

Python library that converts heterogeneous raw data sources — first-person video, VR hand-tracking, motion capture, teleoperation traces — into task-level demonstration data conforming to menily/schema v1. Apache-2.0 open source.

Repository: github.com/MenilyIntelligence/toolkit · Status: Internal alpha · PyPI release: Planned in coming weeks · License: Apache-2.0

What it does

The embodied AI community today works with four structurally incompatible raw data sources:

Source Typical device Raw format Sample rate
POV video iPhone, GoPro, Vision Pro recording .mp4 / .mov 24-60 fps
VR hand-tracking Meta Quest Pro, Vision Pro, PICO 4U Custom binary / JSON frames 60-90 Hz
Motion capture OptiTrack, Vicon, Xsens .bvh / .fbx / .c3d 120-240 Hz
Teleoperation URDF + teleop SDK HDF5 / pickle / RLDS 10-30 Hz

menily/toolkit provides a unified Python API that accepts any of these four sources and outputs a Task object conforming to menily/schema v1. Downstream consumers get a consistent interface regardless of where the data originated.

Architecture

Three adapters, one core:

Installation

# Planned PyPI release
pip install menily-toolkit

# Development install (current)
git clone https://github.com/MenilyIntelligence/toolkit
cd toolkit
pip install -e .

Quick start

From first-person video

from menily.toolkit import pov, schema

tasks = pov.segment(
    video_path="./demo_pour_water.mp4",
    language="Pour water from the blue cup into the kettle.",
    language_variants=[
        "把蓝色杯子里的水倒进水壶里",
        "Fill the kettle with water from the blue cup",
    ],
    fps=30,
    viewpoint="ego",
    body_morphology="bimanual_humanoid",
    collection_region="SEA",
)

for task in tasks:
    task.save_as(
        schema="menily.task-demo/1",
        out_dir="./processed/",
    )

From VR hand-tracking

from menily.toolkit import vr

tasks = vr.from_quest_log(
    log_path="./raw/quest_session_20260414.json",
    language="Assemble the blue widget onto the base plate.",
    fps=60,
    viewpoint="ego",
    body_morphology="bimanual",
    calibration={
        "origin": "room_center",
        "scale_to_robot": 0.9,
    },
)

From motion capture

from menily.toolkit import mocap

tasks = mocap.from_bvh(
    bvh_path="./raw/optitrack_session.bvh",
    segmentation_file="./raw/task_segments.json",
    body_morphology="whole_body_humanoid",
    retarget_to="unitree_g1",
    retarget_backend="adamorph",
    physics_filter=True,
)

Schema validation

task.validate()
# => ValidationReport(
#      schema_version='menily.task-demo/1',
#      passed=True,
#      warnings=[
#        "language.variants is recommended but empty",
#      ],
#      errors=[]
#    )

Export to RLDS / HuggingFace

# Export as RLDS bundle (Open X-Embodiment compatible)
rlds_episode = task.to_rlds()

# Export as HuggingFace Dataset
hf_dataset = task.to_hf_dataset()
hf_dataset.push_to_hub("YOUR_ORG/your-dataset-name")

Retargeting backends

The toolkit.mocap adapter integrates existing open-source retargeting research as pluggable backends. menily/toolkit does not reimplement retargeting — it composes existing work.

Roadmap

Component Status PyPI release
toolkit.core (Task, validation, I/O) Stable 2-3 weeks
toolkit.pov Internal alpha 4-6 weeks
toolkit.vr Internal alpha 4-6 weeks
toolkit.mocap Design finalized 8-10 weeks

Why open source

Data processing toolkits of this kind can be kept closed as a competitive moat. We chose not to, for two reasons:

  1. A schema has value only if it is adopted. If only Menily uses the format, it is not a schema — it is an internal file format. Adoption requires usable open tooling, not just a spec document.
  2. The moat in the data business is not the toolkit — it is the data collection network (workforce, quality control, geographic distribution, client relationships). Toolkits are copied in months; distributed data operations take years to build.

Contributing

Related