Introducing OWAMcap¶
Overview¶
OWAMcap is a high-performance, self-contained, flexible container file format for multimodal desktop log data, powered by the open-source container file format mcap. This format is designed for efficiently recording and processing message data in Open World Agents (OWA) applications.
So, what exactly is mcap?
Simply put, mcap is a format that allows you to record various types of events such as keyboard events, mouse events, and screen captures along with their timestamps. For more detailed information, please refer to the OWAMcap Format Specification section.
Usage Example of OWAMcap - Desktop Recorder¶
What exactly does the OWAMcap format contain? Let's demonstrate with an example of recorded desktop data. Below are sample datasets that you can download and explore yourself:
example.mcap
[Download]example.mkv
[Download]
Click here to see example.mkv
!
Exploring Example Data¶
Let's examine the contents of an OWAMcap file using the owl
command-line tool (Open World agents cLi).
File Summary with owl mcap info
¶
First, we can get an overview of the file structure:
$ owl mcap info example.mcap
library: mcap-owa-support 0.1.0; mcap 1.2.2
profile: owa
messages: 518
duration: 6.8558623s
start: 2025-03-21T17:06:30.7029335+09:00 (1742544390.702933500)
end: 2025-03-21T17:06:37.5587958+09:00 (1742544397.558795800)
compression:
zstd: [1/1 chunks] [48.19 KiB/9.42 KiB (80.44%)] [1.37 KiB/sec]
channels:
(1) window 7 msgs (1.02 Hz) : owa.env.desktop.msg.WindowInfo [jsonschema]
(2) keyboard/state 7 msgs (1.02 Hz) : owa.env.desktop.msg.KeyboardState [jsonschema]
(3) mouse/state 7 msgs (1.02 Hz) : owa.env.desktop.msg.MouseState [jsonschema]
(4) mouse 115 msgs (16.77 Hz) : owa.env.desktop.msg.MouseEvent [jsonschema]
(5) screen 362 msgs (52.80 Hz) : owa.env.gst.msg.ScreenEmitted [jsonschema]
(6) keyboard 20 msgs (2.92 Hz) : owa.env.desktop.msg.KeyboardEvent [jsonschema]
channels: 6
attachments: 0
metadata: 0
Key observations from this output:
-
File Overview:
- Contains 518 messages recorded over 6.86 seconds
- Records from March 21, 2025, with precise start and end timestamps
-
Compression:
- Uses zstd compression, reducing file size by 80.44%
-
Channels (Topics):
- The file contains 6 different channels (or topics), each tracking a specific type of event:
Channel # | Name | Message Count | Frequency | Message Type |
---|---|---|---|---|
1 | window | 7 msgs | 1.02 Hz | WindowInfo |
2 | keyboard/state | 7 msgs | 1.02 Hz | KeyboardState |
3 | mouse/state | 7 msgs | 1.02 Hz | MouseState |
4 | mouse | 115 msgs | 16.77 Hz | MouseEvent |
5 | screen | 362 msgs | 52.80 Hz | ScreenEmitted |
6 | keyboard | 20 msgs | 2.92 Hz | KeyboardEvent |
For example, looking at channel #5 (screen), we can see:
- The topic name is "screen"
- It contains 362 messages
- Recording frequency is 52.80 Hz (slightly lower than the intended 60 Hz, likely due to the short recording time)
- Messages are of type
owa.env.gst.msg.ScreenEmitted
Detailed Message Inspection with owl mcap cat
¶
To examine individual messages, we can use the cat
command:
$ owl mcap cat example.mcap --n 8 --no-pretty
Topic: window, Timestamp: 1741628814049712700, Message: {'title': 'ZType – Typing Game - Type to Shoot - Chromium', 'rect': [389, 10, 955, 1022], 'hWnd': 7540094}
Topic: keyboard/state, Timestamp: 1741628814049712700, Message: {'buttons': []}
Topic: mouse/state, Timestamp: 1742544390703436600, Message: {'x': 1594, 'y': 1112, 'buttons': []}
Topic: mouse, Timestamp: 1742544390707441200, Message: {'event_type': 'move', 'x': 1597, 'y': 1112}
Topic: screen, Timestamp: 1741628814057575300, Message: {'path': 'example.mkv', 'pts': 14866666666, 'utc_ns': 1741628814056571100}
Topic: screen, Timestamp: 1741628814073392700, Message: {'path': 'example.mkv', 'pts': 14883333333, 'utc_ns': 1741628814072476900}
Topic: keyboard, Timestamp: 1741628815015522100, Message: {'event_type': 'release', 'vk': 162}
What we can learn from these messages:
-
Window messages - Track active windows
- Example:
{'title': 'ZType – Typing Game - Type to Shoot - Chromium', 'rect': [389, 10, 955, 1022], 'hWnd': 7540094}
- Shows which window was active, its title, position and size
- Example:
-
Mouse messages - Track cursor position and button states
- Position tracking:
{'x': 1597, 'y': 1112}
- Event types include: "move", "click", etc.
- Position tracking:
-
Keyboard messages - Track key presses and releases
- Example:
{'event_type': 'release', 'vk': 162}
- Records which virtual key was pressed or released
- Example:
-
Screen messages - Link to video frames in the MKV file
- Contains paths, presentation timestamps, and UTC timestamps
Using This Data¶
This structured data allows for powerful analysis and use cases:
- You can filter data based on which window was active at a particular time
- You can synchronize keyboard/mouse events with screen captures
- The timestamps allow for precise reconstruction of user interactions
What's VK(Virtual Key Code)?
Operating systems don't directly use the physical keyboard input values (scan codes) but instead use virtualized keys called VKs. OWA's recorder uses VKs to record keyboard-agnostic data. If you're interested in more details, you can refer to the following resources:
Internals - OWAMcap Format Specification¶
Note for Users
This part is intended for developers who want to utilize the OWAMcap file format for their own applications. Regular users of the library may not need this information.
Technical Specifications¶
- OWAMcap uses the standard
mcap
format withjson
schema - The
mcap-owa-support
Python package, which is within the open-world-agents repository, provides decoders, writers, and readers for this format - All messages must inherit from or implement the
BaseMessage
class fromowa.core.message
What's MCAP?
MCAP (pronounced "em-cap") is an open-source container file format designed for multimodal log data. It supports multiple channels of timestamped pre-serialized data and is ideal for pub/sub or robotics applications.
Key advantages of MCAP:
- High Performance: Efficient storage and retrieval of large event data streams
- Flexible & Open: Works with diverse data types beyond robotics
- Self-Describing: Encodes schema information to ensure compatibility
Implementation Guide¶
Any message that implements BaseMessage
can be recorded in the OWAMcap format. This provides flexibility while maintaining a consistent interface. Following block describes the interface of BaseMessage
.
class BaseMessage(ABC):
_type: str
@abstractmethod
def serialize(self, buffer: io.BytesIO): ...
@classmethod
@abstractmethod
def deserialize(cls, buffer: io.BytesIO) -> Self: ...
@classmethod
@abstractmethod
def get_schema(cls): ...
File Format Considerations¶
Why Use .mcap
?¶
There are very few open-source formats available for heterogeneous timestamped data. ROS's bagfile format is one option, but it heavily depends on the ROS ecosystem and often requires installation of ROS1/2. In comparison, mcap
is self-contained and efficient, especially for random read (or seeking) operations, which is critical for training VLA (Vision-Language-Action) models.