Skip to content

Why OWAMcap?

The Problem: Desktop AI datasets are fragmented. Every research group uses different formats, making it impossible to combine datasets or build large-scale foundation models.

The Solution: OWAMcap provides a universal standard that treats all desktop interaction datasets equally.

The Robotics Lesson

The Open-X Embodiment project had to manually convert 22 different robotics datasets - months of work just to combine data. Desktop automation is heading down the same path.

OWAMcap Changes This

Before: Data Silos

Dataset A (Custom Format) ──┐
Dataset B (Custom Format) ──┼── Manual Conversion ──→ Limited Training Data
Dataset C (Custom Format) ──┘

After: Universal Standard

Dataset A (OWAMcap) ──┐
Dataset B (OWAMcap) ──┼── Direct Combination ──→ Large-Scale Foundation Models
Dataset C (OWAMcap) ──┘

From Recording to Training in 3 Commands

OWAMcap integrates with the complete OWA Data Pipeline:

# 1. Record desktop interaction
ocap my-session.mcap

# 2. Process to training format
python scripts/01_raw_events_to_event_dataset.py --train-dir ./

# 3. Train your model
python train.py --dataset ./event-dataset

Result: Any OWAMcap dataset works with any OWA-compatible training pipeline.

Technical Advantages

  • 91.7× compression through hybrid storage (metadata + external video)
  • Nanosecond precision for perfect event synchronization
  • Standard tools work with video files (VLC, FFmpeg, etc.)
  • Lazy loading for memory-efficient processing

Real Impact

$ owl mcap info example.mcap
messages:  864 (10.36s of interaction data)
file size: 22 KiB (vs 1+ GB raw)
channels:  screen, mouse, keyboard, window

Bottom Line: OWAMcap transforms desktop interaction data from isolated collections into a unified resource for building the next generation of foundation models.


Ready to get started? Continue to the OWAMcap Format Guide for technical details.