Skip to content
Open World Agents

Open World Agents Documentation

Open World Agents (OWA) is a monorepo for building AI agents that interact with desktop applications. It provides data capture, environment control, and training utilities.

Quick Start

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./

# 3. Train your model
$ python train.py --dataset ./event-dataset

📖 Detailed Guide: Complete Quick Start Tutorial

Architecture Overview

OWA consists of the following core components:

  • 🌍 Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
  • 📊 Data Infrastructure: Complete desktop agent data pipeline from recording to training with OWAMcap format - a universal standard powered by MCAP
  • 🛠️ CLI Tools: Command-line utilities (owl) for recording, analyzing, and managing agent data
  • 🤖 Examples: Complete implementations and training pipelines for multimodal agents

Project Structure

The repository is organized as a monorepo with multiple sub-repositories under the projects/ directory. Each sub-repository is a self-contained Python package installable via pip or uv and follows namespace packaging conventions.

open-world-agents/
├── projects/
│   ├── mcap-owa-support/     # OWAMcap format support
│   ├── owa-core/             # Core framework and registry system
│   ├── owa-msgs/             # Core message definitions with automatic discovery
│   ├── owa-cli/              # Command-line tools (ocap, owl)
│   ├── owa-env-desktop/      # Desktop environment plugin
│   ├── owa-env-example/      # Example environment implementations
│   ├── owa-env-gst/          # GStreamer-based screen capture
│   └── [your-plugin]/        # Contribute your own plugins!
├── docs/                     # Documentation
└── README.md

Core Packages

owa owa

The easiest way to get started is to install the owa meta-package, which includes all core components and environment plugins:

$ pip install owa

All OWA packages use namespace packaging and are installed in the owa namespace (e.g., owa.core, owa.cli, owa.env.desktop). For more detail, see Packaging namespace packages. We recommend using uv as the package manager.

Name PyPI Conda Description
owa.core owa-core owa-core Framework foundation with registry system
owa.msgs owa-msgs owa-msgs Core message definitions with automatic discovery
owa.cli owa-cli owa-cli Command-line tools (owl) for data analysis
mcap-owa-support mcap-owa-support mcap-owa-support OWAMcap format support and utilities
ocap 🎥 ocap ocap Desktop recorder for multimodal data capture
owa.env.desktop owa-env-desktop owa-env-desktop Mouse, keyboard, window event handling
owa.env.gst 🎥 owa-env-gst owa-env-gst High-performance, hardware-accelerated screen capture
owa.env.example - - Reference implementations for learning

🎥 Video Processing Packages: Packages marked with 🎥 require GStreamer dependencies. Install $ conda install open-world-agents::gstreamer-bundle first for full functionality.

📦 Lockstep Versioning: All first-party OWA packages follow lockstep versioning, meaning they share the same version number to ensure compatibility and simplify dependency management.


🌍 Environment Framework

Universal interface for native desktop automation with real-time event handling and zero-configuration plugin discovery.

Environment Navigation

Section Description
Environment Overview Core concepts and quick start guide
Environment Guide Complete system overview and usage examples
Custom Plugins Create your own environment extensions
CLI Tools Plugin management and exploration commands

Built-in Plugins:

Plugin Description Key Features
Standard Core utilities Time functions, periodic tasks
Desktop Desktop automation Mouse/keyboard control, window management
GStreamer Hardware-accelerated capture Fast screen recording

📊 Data Infrastructure

Desktop AI needs high-quality, synchronized multimodal data: screen captures, mouse/keyboard events, and window context. OWA provides the complete pipeline from recording to training.

🚀 Getting Started

New to OWA data? Start here:

📚 Technical Reference

🛠️ Tools & Ecosystem

🤗 Community Datasets

Browse Datasets: 🤗 HuggingFace


🤖 Examples

Example Description Status
Multimodal Game Agent Vision-based game playing agent 🚧 In Progress
GUI Agent General desktop application automation 🚧 In Progress
Interactive World Model Predictive modeling of desktop environments 🚧 In Progress
Usage with LLMs Integration with large language models 🚧 In Progress
Usage with Transformers Vision transformer implementations 🚧 In Progress

Development Resources

Learn how to contribute, report issues, and get help.

Resource Description
Help with OWA Community support resources
Installation Guide Detailed installation instructions
Contributing Guide Development setup, bug reports, feature proposals
FAQ for Developers Common questions and troubleshooting

Features

🌍 Environment Framework: "USB-C of Desktop Agents"

  • ⚡ Real-time Performance: Optimized for responsive agent interactions (GStreamer components achieve <30ms latency)
  • 🔌 Zero-Configuration: Automatic plugin discovery via Python Entry Points
  • 🌐 Event-Driven: Asynchronous processing that mirrors real-world dynamics
  • 🧩 Extensible: Community-driven plugin ecosystem

→ View Environment Framework Guide

📊 Data Infrastructure: Complete Pipeline

  • 🌐 Universal Standard: Unlike fragmented formats, enables seamless dataset combination for large-scale foundation models (OWAMcap)
  • High-Performance Multimodal Storage: Lightweight MCAP container with nanosecond precision for synchronized data streams (MCAP)
  • 🔗 Flexible MediaRef: Smart references to both external and embedded media (file paths, URLs, data URIs, video frames) with lazy loading - keeps metadata files small while supporting rich media (OWAMcap)Learn more
  • 🤗 Training Pipeline Ready: Native HuggingFace integration, seamless dataset loading, and direct compatibility with ML frameworks (Ecosystem)Browse datasets | Data pipeline

→ View Data Infrastructure Guide

🤗 Community & Ecosystem

  • 🌱 Growing Ecosystem: Hundreds of community datasets in unified OWAMcap format
  • 🤗 HuggingFace Integration: Native dataset loading, sharing, and interactive preview tools
  • 🧩 Extensible Architecture: Modular design for custom environments, plugins, and message types
  • 💡 Community-Driven: Plugin ecosystem spanning gaming, web automation, mobile control, and specialized domains

→ View Community Datasets


License

This project is released under the MIT License. See the LICENSE file for details.