Skip to content

Why OWAMcap?

The Problem: Desktop AI datasets are fragmented. Every research group uses different formats, making it impossible to combine datasets or build large-scale foundation models.

The Solution: OWAMcap provides a universal standard that treats all desktop interaction datasets equally.

The Robotics Lesson

The Open-X Embodiment project had to manually convert 22 different robotics datasets - months of work just to combine data. Desktop automation is heading down the same path.

OWAMcap Changes This

Before: Data Silos

Dataset A (Custom Format) ──┐
Dataset B (Custom Format) ──┼── Manual Conversion ──→ Limited Training Data
Dataset C (Custom Format) ──┘

After: Universal Standard

Dataset A (OWAMcap) ──┐
Dataset B (OWAMcap) ──┼── Direct Combination ──→ Large-Scale Foundation Models
Dataset C (OWAMcap) ──┘

From Recording to Training in 3 Commands

OWAMcap integrates with the complete OWA Data Pipeline:

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_events_to_event_dataset.py --train-dir ./

# 3. Train your model
$ python train.py --dataset ./event-dataset

📖 Detailed Guide: Complete Quick Start Tutorial - Step-by-step walkthrough with examples and troubleshooting

Result: Any OWAMcap dataset works with any OWA-compatible training pipeline.

Key Features

  • 🔄 Universal Standard: Unlike fragmented formats, enables seamless dataset combination for large-scale foundation models (OWAMcap)
  • 🎯 High-Performance Multimodal Storage: Lightweight MCAP container with nanosecond precision for synchronized data streams (MCAP)
  • 🔗 Flexible MediaRef: Smart references to both external and embedded media (file paths, URLs, data URIs, video frames) with lazy loading - keeps metadata files small while supporting rich media (OWAMcap)Learn more
  • 🤗 Training Pipeline Ready: Native HuggingFace integration, seamless dataset loading, and direct compatibility with ML frameworks (Ecosystem)Browse datasets | Data pipeline

Real Impact

$ owl mcap info example.mcap
messages:  864 (10.36s of interaction data)
file size: 22 KiB (vs 1+ GB raw)
channels:  screen, mouse, keyboard, window

Bottom Line: OWAMcap transforms desktop interaction data from isolated collections into a unified resource for building the next generation of foundation models.


Ready to get started? Continue to the OWAMcap Format Guide for technical details.