Research
Short, dated working notes and draft papers on embodied AI data infrastructure. Not polished publications — this is the thinking in progress. Updated as the work moves forward.
Draft papers
Task-Level Demonstration Data for Vision-Language-Action Models: A Survey of Schemas, Adapters, and Cross-Embodiment Transfer
Masashi. Menily Intelligence, Shenzhen, China. April 2026. Draft v0.1. 12 pages.
📄 Download PDF — Task-Level VLA Data Survey (258 KB, 12 pages)
Self-hosted preprint · BibTeX citation below · CC BY 4.0
Abstract. Training vision–language–action (VLA) models for embodied AI requires task-level demonstration data — units that couple a natural-language instruction, a visual context, an action trajectory, and a body morphology specification into a single semantically closed unit. While trajectory-level datasets (Open X-Embodiment, DROID) and motion-level datasets (BONES-SEED, AMASS) have reached a degree of standardization, the task-level semantic layer that sits between them remains fragmented. This fragmentation is the primary barrier to cross-institutional data pooling and cross-embodiment transfer.
A comprehensive survey of twelve task-level demonstration data systems (2023–2026), spanning trajectory-level datasets (Open X-Embodiment, DROID, BridgeData V2, OXE-AugE), motion-level datasets (BONES-SEED, AMASS, LAFAN1), and end-to-end VLA pipelines (π0, OpenVLA, GR00T N1, Gemini Robotics, Ψ₀). The paper identifies the structural gap in the task-level semantic layer — no de-facto standard exists at this layer despite standardization at the layers above and below. The paper proposes menily/schema v1 as a candidate specification with controlled vocabularies for action space, viewpoint, morphology, and data source.
Discusses open problems including long-horizon task decomposition, multi-agent data representation, whole-body loco-manipulation boundaries, quality metrics for data ingestion, synthetic data provenance, and the governance of de-facto standards.
40 references. Covers all major systems 2023–2026. Self-hosted preprint (not on arXiv).
Citation (BibTeX)
@misc{masashi2026tasklevel,
author = {Masashi},
title = {Task-Level Demonstration Data for Vision-Language-Action
Models: A Survey of Schemas, Adapters, and
Cross-Embodiment Transfer},
year = {2026},
month = {April},
howpublished = {Menily Intelligence Research, self-hosted preprint},
url = {https://www.menily.ai/research/01-task-level-vla-data-survey.pdf},
note = {Draft v0.1}
}
Related: GitHub source repository · menily/schema v1 specification · menily/toolkit Python library
Working notes
The data gap in embodied AI, stated precisely
April 2026. The bottleneck for generalist embodied agents in 2026 is not model capacity — it is the shape, resolution, and diversity of demonstration data. Why hour-counts mislead. What task-level data actually means. Where production data shortfalls bite.
Task-level abstraction: why frame-level annotation breaks VLA
April 2026. For VLA training, frame-level annotation is the wrong unit of work. Three failure modes: the action head gets the wrong target, task boundaries become lossy post-hoc, and per-frame language does not match deployment distribution. Task-level labeling is cheaper in absolute terms and produces data VLA can actually consume.
Cross-embodiment transfer in task-level demonstration data
April 2026.
Why cross-embodiment transfer fails today (implicit action space, undocumented morphology, body-relative task representation) and what a transferable demonstration requires (explicit action space, morphology identifier with DoF map, task-relative reference frames, invariant landmarks). How menily/schema and AdaMorph / OmniRetarget / SPARK tools work together.
Technical design notes
VLA 任务级示教数据 schema 设计笔记:Menily/schema v1 规范与六字段解析
April 2026 · Chinese.
Long-form technical walkthrough of menily/schema v1: why each field is defined the way it is, what's deliberately out of scope, how the schema interoperates with Open X-Embodiment / RLDS and BONES-SEED / SOMA. Originally published on CSDN.
Research notes citation
@misc{menily2026notes,
author = {Menily Intelligence},
title = {Research notes: data infrastructure for embodied AI},
year = {2026},
url = {https://github.com/MenilyIntelligence/research}
}
For the survey paper citation, see the BibTeX block in the "Draft papers" section above.
Contributing
These notes are deliberately early-stage. If you are building a VLA pipeline, a humanoid robotics data operation, or a retargeting toolchain and have spotted a factual error, a missing reference, or a disagreement with a judgment — we want to hear about it.
- Issues: GitHub Issues
- Direct feedback: [email protected]
- Schema discussions: menily/schema Issues
Related
- menily/schema v1 — the task-level demonstration data specification discussed in the survey
- menily/toolkit — reference Python implementation for schema encoding/decoding
- About Menily Intelligence — team, founder, and operational structure