Research

Short, dated working notes and draft papers on embodied AI data infrastructure. Not polished publications — this is the thinking in progress. Updated as the work moves forward.

Draft papers

Task-Level Demonstration Data for Vision-Language-Action Models: A Survey of Schemas, Adapters, and Cross-Embodiment Transfer

Masashi. April 2026. Draft v0.1.

A comprehensive survey of twelve task-level demonstration data systems (2023–2026), spanning trajectory-level datasets (Open X-Embodiment, DROID, BridgeData V2, OXE-AugE), motion-level datasets (BONES-SEED, AMASS, LAFAN1), and end-to-end VLA pipelines (π0, OpenVLA, GR00T N1, Gemini Robotics, Ψ₀). The paper identifies the structural gap in the task-level semantic layer — no de-facto standard exists at this layer despite standardization at the layers above and below. The paper proposes menily/schema v1 as a candidate specification with controlled vocabularies for action space, viewpoint, morphology, and data source.

Discusses open problems including long-horizon task decomposition, multi-agent data representation, whole-body loco-manipulation boundaries, quality metrics for data ingestion, synthetic data provenance, and the governance of de-facto standards.

40 references. Covers all major systems 2023–2026. Self-hosted preprint, not yet on arXiv.

Download PDF · GitHub source

Working notes

The data gap in embodied AI, stated precisely

April 2026. The bottleneck for generalist embodied agents in 2026 is not model capacity — it is the shape, resolution, and diversity of demonstration data. Why hour-counts mislead. What task-level data actually means. Where production data shortfalls bite.

Read on GitHub

Task-level abstraction: why frame-level annotation breaks VLA

April 2026. For VLA training, frame-level annotation is the wrong unit of work. Three failure modes: the action head gets the wrong target, task boundaries become lossy post-hoc, and per-frame language does not match deployment distribution. Task-level labeling is cheaper in absolute terms and produces data VLA can actually consume.

Read on GitHub

Cross-embodiment transfer in task-level demonstration data

April 2026. Why cross-embodiment transfer fails today (implicit action space, undocumented morphology, body-relative task representation) and what a transferable demonstration requires (explicit action space, morphology identifier with DoF map, task-relative reference frames, invariant landmarks). How menily/schema and AdaMorph / OmniRetarget / SPARK tools work together.

Read on GitHub

Technical design notes

VLA 任务级示教数据 schema 设计笔记:Menily/schema v1 规范与六字段解析

April 2026 · Chinese. Long-form technical walkthrough of menily/schema v1: why each field is defined the way it is, what's deliberately out of scope, how the schema interoperates with Open X-Embodiment / RLDS and BONES-SEED / SOMA. Originally published on CSDN.

Read on CSDN

Citation

@misc{menily2026survey,
  author    = {Masashi},
  title     = {Task-Level Demonstration Data for Vision-Language-Action
               Models: A Survey of Schemas, Adapters, and
               Cross-Embodiment Transfer},
  year      = {2026},
  howpublished = {Menily Intelligence Research},
  url       = {https://www.menily.ai/research/}
}

@misc{menily2026notes,
  author    = {Menily Intelligence},
  title     = {Research notes: data infrastructure for embodied AI},
  year      = {2026},
  url       = {https://github.com/MenilyIntelligence/research}
}

Contributing

These notes are deliberately early-stage. If you are building a VLA pipeline, a humanoid robotics data operation, or a retargeting toolchain and have spotted a factual error, a missing reference, or a disagreement with a judgment — we want to hear about it.

Related