Why OWAMcap?¶
The Problem: Desktop AI datasets are fragmented. Every research group uses different formats, making it impossible to combine datasets or build large-scale foundation models.
The Solution: OWAMcap provides a universal standard that treats all desktop interaction datasets equally.
The Robotics Lesson¶
The Open-X Embodiment project had to manually convert 22 different robotics datasets - months of work just to combine data. Desktop automation is heading down the same path.
OWAMcap Changes This¶
Before: Data Silos¶
Dataset A (Custom Format) ──┐
Dataset B (Custom Format) ──┼── Manual Conversion ──→ Limited Training Data
Dataset C (Custom Format) ──┘
After: Universal Standard¶
Dataset A (OWAMcap) ──┐
Dataset B (OWAMcap) ──┼── Direct Combination ──→ Large-Scale Foundation Models
Dataset C (OWAMcap) ──┘
From Recording to Training in 3 Commands¶
OWAMcap integrates with the complete OWA Data Pipeline:
# 1. Record desktop interaction
ocap my-session.mcap
# 2. Process to training format
python scripts/01_raw_events_to_event_dataset.py --train-dir ./
# 3. Train your model
python train.py --dataset ./event-dataset
Result: Any OWAMcap dataset works with any OWA-compatible training pipeline.
Technical Advantages¶
- 91.7× compression through hybrid storage (metadata + external video)
- Nanosecond precision for perfect event synchronization
- Standard tools work with video files (VLC, FFmpeg, etc.)
- Lazy loading for memory-efficient processing
Real Impact¶
$ owl mcap info example.mcap
messages: 864 (10.36s of interaction data)
file size: 22 KiB (vs 1+ GB raw)
channels: screen, mouse, keyboard, window
Bottom Line: OWAMcap transforms desktop interaction data from isolated collections into a unified resource for building the next generation of foundation models.
Ready to get started? Continue to the OWAMcap Format Guide for technical details.