OWAMcap Format Guide¶
What is OWAMcap?
OWAMcap is a specification for using the open-source MCAP container format with Open World Agents (OWA) message definitions. It provides an efficient way to store and process multimodal desktop interaction data including screen captures, mouse events, keyboard events, and window information.
New to OWAMcap?
Start with Why OWAMcap? to understand the problem it solves and why you should use it.
Table of Contents¶
- Getting Started
- Quick Start - Get started in 3 steps
- Core Concepts - Essential message types and features
- Working with OWAMcap
- Media Handling - External references and lazy loading
- Reading and Writing - File operations and CLI tools
- Storage & Performance - Efficiency characteristics
- Advanced Topics
- Extending OWAMcap - Custom message types and extensibility
- Data Pipeline Integration - owa-data reference
- Reference
- Migration & Troubleshooting - Practical help and common issues
- Technical Reference - Specifications and standards
Getting Started¶
Quick Start¶
Try OWAMcap in 3 Steps
1. Install the packages:
2. Explore an example file with the owl CLI:
What is owl?
owl is the command-line interface for OWA tools, installed with owa-cli. See the CLI documentation for complete usage.
# Download example file
wget https://github.com/open-world-agents/open-world-agents/raw/main/docs/data/examples/example.mcap
# View file info
owl mcap info example.mcap
# List first 5 messages
owl mcap cat example.mcap --n 5
3. Load in Python:
Core Concepts¶
OWAMcap combines the robustness of the MCAP container format with OWA's specialized message types for desktop environments, creating a powerful format for recording, analyzing, and training on human-computer interaction data.
Key Terms:
Essential Terminology
- MCAP: A modular container file format for heterogeneous, timestamped data (like a ZIP file for time-series data). Developed by Foxglove, MCAP provides efficient random access, compression, and self-describing schemas. Widely adopted in robotics (ROS ecosystem), autonomous vehicles, and IoT applications for its performance and interoperability.
- Topic: A named channel in MCAP files (e.g., "screen", "mouse") that groups related messages
- Lazy Loading: Loading data only when needed, crucial for memory efficiency with large datasets
What Makes a File "OWAMcap":
┌─────────────────────────────────────────────────────────────┐
│ OWAMcap File (.mcap) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Metadata │ │ Timestamps │ │ Messages │ │
│ │ - Profile │ │ - Nanosecond │ │ - Mouse │ │
│ │ - Topics │ │ precision │ │ - Keyboard │ │
│ │ - Schemas │ │ - Event sync │ │ - Window │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
│
│ References
▼
┌─────────────────────────────────────────────────────────────┐
│ External Media Files (.mkv, .png) │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────┐ │
│ │ Video Frames │ │ Screenshots │ │ Audio │ │
│ │ - H.265 codec │ │ - PNG/JPEG │ │ - Optional │ │
│ │ - Hardware acc │ │ - Lossless │ │ - Sync'd │ │
│ └─────────────────┘ └─────────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────┘
- Base Format: Standard MCAP container format
- Profile:
owadesignation in MCAP metadata - Schema Encoding: JSON Schema
- Message Interface: All messages implement
BaseMessagefromowa.core.message - Standard Messages: Core message types from
owa-msgspackage
Why MCAP?
Built as the successor to ROSBag, MCAP offers efficient storage and retrieval for heterogeneous timestamped data with minimal dependencies. It's designed for modern use cases with optimized random access, built-in compression, and language-agnostic schemas. The format has gained significant adoption across the robotics community, autonomous vehicle companies (Cruise, Waymo), and IoT platforms due to its performance advantages and excellent tooling ecosystem.
$ owl mcap info example.mcap
library: mcap-owa-support 0.5.1; mcap 1.3.0
profile: owa
messages: 864
duration: 10.3574349s
start: 2025-06-27T18:49:52.129876+09:00 (1751017792.129876000)
end: 2025-06-27T18:50:02.4873109+09:00 (1751017802.487310900)
compression:
zstd: [1/1 chunks] [116.46 KiB/16.61 KiB (85.74%)] [1.60 KiB/sec]
channels:
(1) window 11 msgs (1.06 Hz) : desktop/WindowInfo [jsonschema]
(2) keyboard/state 11 msgs (1.06 Hz) : desktop/KeyboardState [jsonschema]
(3) mouse/state 11 msgs (1.06 Hz) : desktop/MouseState [jsonschema]
(4) screen 590 msgs (56.96 Hz) : desktop/ScreenCaptured [jsonschema]
(5) mouse 209 msgs (20.18 Hz) : desktop/MouseEvent [jsonschema]
(6) keyboard 32 msgs (3.09 Hz) : desktop/KeyboardEvent [jsonschema]
channels: 6
attachments: 0
metadata: 0
Key Features:
- 🌐 Universal Standard: Unlike fragmented formats, enables seamless dataset combination for large-scale foundation models (OWAMcap)
- ⚡ High-Performance Multimodal Storage: Lightweight MCAP container with nanosecond precision for synchronized data streams (MCAP)
- 🔗 Flexible MediaRef: Smart references to both external and embedded media (file paths, URLs, data URIs, video frames) with lazy loading - keeps metadata files small while supporting rich media (OWAMcap) → Learn more
- 🤗 Training Pipeline Ready: Native HuggingFace integration, seamless dataset loading, and direct compatibility with ML frameworks (Ecosystem) → Browse datasets | Data pipeline
Core Message Types:
OWA provides standardized message types through the owa-msgs package for consistent desktop interaction recording:
| Message Type | Description |
|---|---|
desktop/KeyboardEvent |
Keyboard press/release events |
desktop/KeyboardState |
Current keyboard state |
desktop/MouseEvent |
Mouse movement, clicks, scrolls |
desktop/MouseState |
Current mouse position and buttons |
desktop/RawMouseEvent |
High-definition raw mouse input data |
desktop/ScreenCaptured |
Screen capture frames with timestamps |
desktop/WindowInfo |
Active window information |
class KeyboardEvent(OWAMessage):
_type = "desktop/KeyboardEvent"
event_type: str # "press" or "release"
vk: int # Virtual key code (e.g., 65 for 'A')
timestamp: int # Event timestamp
# Example: User presses the 'A' key
KeyboardEvent(event_type="press", vk=65, timestamp=1234567890)
What's VK (Virtual Key Code)?
Operating systems don't directly use the physical keyboard input values (scan codes) but instead use virtualized keys called VKs. OWA's recorder uses VKs to record keyboard-agnostic data. If you're interested in more details, you can refer to the following resources:
class MouseEvent(OWAMessage):
_type = "desktop/MouseEvent"
event_type: str # "move", "click", "scroll", "drag"
x: int # Screen X coordinate
y: int # Screen Y coordinate
button: Optional[str] = None # "left", "right", "middle"
# Example: Mouse click at position (100, 200)
MouseEvent(event_type="click", x=100, y=200, button="left")
class RawMouseEvent(OWAMessage):
_type = "desktop/RawMouseEvent"
us_flags: mouse state flags, containing movement data type (relative/absolute). Default is relative.
last_x: can be relative or absolute, depends on us_flags
last_y: can be relative or absolute, depends on us_flags
button_flags: Raw button state flags from Windows RAWMOUSE structure
button_data: Additional button data (wheel delta, etc.)
device_handle: Raw input device handle (optional)
timestamp: Optional timestamp in nanoseconds since epoch
# Example: Raw mouse movement
RawMouseEvent(us_flags=0x0000, last_x=15, last_y=-10, button_flags=0x0000, button_data=0)
class ScreenCaptured(OWAMessage):
_type = "desktop/ScreenCaptured"
utc_ns: Optional[int] = None # System timestamp (nanoseconds)
source_shape: Optional[Tuple[int, int]] = None # Original (width, height)
shape: Optional[Tuple[int, int]] = None # Current (width, height)
media_ref: Optional[MediaRef] = None # URI or file path reference
frame_arr: Optional[np.ndarray] = None # In-memory BGRA array (excluded from JSON)
Working with ScreenCaptured Messages
For detailed information on creating, loading, and working with ScreenCaptured messages, see the Media Handling section below. It covers MediaRef formats, lazy loading, and practical usage patterns.
Working with OWAMcap¶
This section covers the essential operations for working with OWAMcap files in your applications. Whether you're processing recorded desktop sessions or creating new datasets, these patterns will help you work efficiently with the format.
Media Handling¶
OWAMcap's key advantage is efficient media handling through external media references. Instead of storing large image/video data directly in the MCAP file, OWAMcap stores lightweight references to external media files, keeping the MCAP file small and fast to process.
Understanding MediaRef
MediaRef is OWAMcap's way of referencing media content, powered by the mediaref package. It supports multiple formats:
- File paths:
/absolute/pathorrelative/path - File URIs:
file:///path/to/file - HTTP URLs:
https://example.com/image.png - Data URIs:
data:image/png;base64,...(embedded content)
For videos, add pts_ns (presentation timestamp) to specify which frame.
The mediaref package is automatically installed with owa-msgs and provides efficient lazy loading and batch decoding capabilities.
from owa.core import MESSAGES
import numpy as np
ScreenCaptured = MESSAGES['desktop/ScreenCaptured']
# File paths (absolute/relative) - works for images and videos
screen_msg = ScreenCaptured(media_ref={"uri": "/absolute/path/image.png"})
screen_msg = ScreenCaptured(media_ref={"uri": "relative/video.mkv", "pts_ns": 123456})
# File URIs - works for images and videos
screen_msg = ScreenCaptured(media_ref={"uri": "file:///path/to/image.jpg"})
screen_msg = ScreenCaptured(media_ref={"uri": "file:///path/to/video.mp4", "pts_ns": 123456})
# HTTP/HTTPS URLs - works for images and videos
screen_msg = ScreenCaptured(media_ref={"uri": "https://example.com/image.png"})
screen_msg = ScreenCaptured(media_ref={"uri": "https://example.com/video.mp4", "pts_ns": 123456})
# Data URIs (embedded base64) - typically for images
screen_msg = ScreenCaptured(media_ref={"uri": "..."})
# From raw image array (BGRA format required)
bgra_array = np.random.randint(0, 255, (1080, 1920, 4), dtype=np.uint8)
screen_msg = ScreenCaptured(frame_arr=bgra_array)
screen_msg.embed_as_data_uri(format="png") # Required for serialization
# Now screen_msg.media_ref contains: {"uri": "data:image/png;base64,..."}
Why Lazy Loading Matters
Lazy Loading means frame data is only loaded when you explicitly request it. This is crucial for performance:
- ✅ Fast: Iterate through thousands of messages instantly
- ✅ Memory efficient: Only load frames you actually need
- ✅ Scalable: Work with datasets larger than your RAM
Without lazy loading, opening a 1-hour recording would try to load ~200GB of frame data into memory!
# IMPORTANT: For MCAP files, resolve relative paths first
# The OWA recorder saves media paths relative to the MCAP file location
ScreenCaptured = MESSAGES['desktop/ScreenCaptured']
screen_msg = ScreenCaptured(
media_ref={"uri": "relative/video.mkv", "pts_ns": 123456789}
)
# Must resolve external paths before loading from MCAP files
screen_msg.resolve_relative_path("/path/to/data.mcap")
# Lazy loading: Frame data is loaded on-demand when these methods are called
rgb_array = screen_msg.to_rgb_array() # RGB numpy array (most common)
pil_image = screen_msg.to_pil_image() # PIL Image object
bgra_array = screen_msg.load_frame_array() # Raw BGRA array (native format)
# Check if frame data is loaded (lazy loading means it starts as None)
if screen_msg.frame_arr is not None:
height, width, channels = screen_msg.frame_arr.shape
print(f"Frame: {width}x{height}, {channels} channels")
else:
print("Frame data not loaded - use load_frame_array() first")
Reading and Writing¶
from mcap_owa.highlevel import OWAMcapReader
with OWAMcapReader("session.mcap") as reader:
# File metadata
print(f"Topics: {reader.topics}")
print(f"Duration: {(reader.end_time - reader.start_time) / 1e9:.2f}s")
# Lazy loading advantage: Fast iteration without loading frame data
for msg in reader.iter_messages(topics=["screen"]):
screen_data = msg.decoded
print(f"Frame metadata: {screen_data.shape} at {screen_data.utc_ns}")
# No frame data loaded yet - extremely fast for large datasets
# Only load frame data when actually needed
if some_condition: # e.g., every 10th frame
frame = screen_data.to_rgb_array() # Now frame is loaded
break # Just show first frame
from mcap_owa.highlevel import OWAMcapWriter
from owa.core import MESSAGES
ScreenCaptured = MESSAGES['desktop/ScreenCaptured']
MouseEvent = MESSAGES['desktop/MouseEvent']
with OWAMcapWriter("output.mcap") as writer:
# Write screen capture
screen_msg = ScreenCaptured(
utc_ns=1234567890,
media_ref={"uri": "video.mkv", "pts_ns": 1234567890},
shape=(1920, 1080)
)
writer.write_message(screen_msg, topic="screen", timestamp=1234567890)
# Write standard mouse event
mouse_msg = MouseEvent(event_type="click", x=100, y=200)
writer.write_message(mouse_msg, topic="mouse", timestamp=1234567891)
# Time range filtering
with OWAMcapReader("session.mcap") as reader:
start_time = reader.start_time + 1_000_000_000 # Skip first second
end_time = reader.start_time + 10_000_000_000 # First 10 seconds
for msg in reader.iter_messages(start_time=start_time, end_time=end_time):
print(f"Message in range: {msg.topic}")
# Remote files
with OWAMcapReader("https://example.com/data.mcap") as reader:
for msg in reader.iter_messages(topics=["screen"]):
print(f"Remote frame: {msg.decoded.shape}")
Storage & Performance¶
OWAMcap achieves remarkable storage efficiency through external video references and intelligent compression.
Compression Benefits:
Compression Performance
Compression performance varies significantly across formats. H.265 encoding achieves a 217.8× compression ratio compared to raw BGRA data while maintaining visual quality suitable for agent training, enabling practical storage of large-scale desktop interaction datasets.
Desktop screen capture at 1920 × 1080 resolution, 12 s @ 60 Hz:
| Format | Size per Frame | Whole Size | Compression Ratio |
|---|---|---|---|
| Raw BGRA | 5.97 MB | 4.2 GB | 1.0× (baseline) |
| PNG | 1.87 MB | 1.31 GB | 3.2× |
| JPEG (Quality 85) | 191 KB | 135 MB | 31.9× |
| H.265 (keyframe 0.5s, nvd3d11h265enc) | 27.8 KB avg | 19.6 MB | 217.8× |
Compression benefit per resolution
Compression performance is resolution-dependent. Lower resolutions yield lower compression ratios.
Desktop screen capture at 600 × 800 resolution, 13 s @ 60 Hz:
| Format | Size per Frame | Whole Size | Compression Ratio |
|---|---|---|---|
| Raw BGRA | 1.37 MB | 1.0 GB | 1.0× (baseline) |
| PNG | 468 KB | 341 MB | 3.0× |
| JPEG (Quality 85) | 64.6 KB | 47.2 MB | 21.7× |
| H.265 (keyframe 0.5s, nvd3d11h265enc) | 15.3 KB avg | 11.2 MB | 91.7× |
H.265 Configuration
The H.265 settings shown above (keyframe 0.5s, nvd3d11h265enc) are the same as those used by ocap for efficient desktop recording.
Key advantages:
- Lightweight MCAP: very fast to parse, transfer, and back up
- Video Compression: leverages hardware-accelerated codecs for extreme savings
- Selective Loading: grab only the frames you need without full decompression
- Standard Tools: preview in any video player and edit with off-the-shelf software
Advanced Topics¶
Extending OWAMcap¶
Custom Message Types:
Need to store domain-specific data beyond standard desktop interactions? OWAMcap supports custom message types for sensors, gaming, robotics, and more.
Custom Messages Documentation
📖 Custom Message Types Guide - Complete guide to creating, registering, and using custom message types in OWAMcap.
Covers: message creation, package registration, best practices, and CLI integration.
Data Pipeline Integration¶
See owa-data README for full pipeline documentation.
Reference¶
Migration & Troubleshooting¶
File Migration:
OWAMcap format evolves over time. When you encounter older files that need updating, use the migration tool:
When Do You Need Migration?
- Error messages about unsupported schema versions
- Missing fields when loading older recordings
- Compatibility warnings from OWA tools
- Performance issues with legacy file formats
Migration commands:
# Check if migration is needed
owl mcap info old_file.mcap # Look for version warnings
# Preview what will change (safe, no modifications)
owl mcap migrate run old_file.mcap --dry-run
# Migrate single file (creates backup automatically)
owl mcap migrate run old_file.mcap
# Migrate multiple files in batch
owl mcap migrate run *.mcap
# Migrate with custom output location
owl mcap migrate run old_file.mcap --output new_file.mcap
Migration Safety
- Automatic backups: Original files are preserved as
.backup - Validation: Migrated files are automatically validated
- Rollback: Use backup files if migration causes issues
Complete Migration Reference
For detailed information about all migration commands and options, see the OWA CLI - MCAP Commands documentation.
Common Issues:
File Not Found Errors
When video files are missing:
Memory Usage
Large datasets can consume memory:
Technical Reference¶
For detailed technical specifications, see:
- OEP-0006: OWAMcap Profile Specification - Authoritative format specification
- MCAP Format - Base container format documentation
- Message Registry - Message implementation
Next Steps¶
- Exploring Data: Learn to work with OWAMcap files
- Data Pipeline: Process OWAMcap for ML training
- Viewer: Visualize OWAMcap data interactively