Skip to content
Open World Agents

Open World Agents Documentation

A comprehensive framework for building AI agents that interact with desktop applications through vision, keyboard, and mouse control.

Open World Agents (OWA) is a monorepo containing the complete toolkit for multimodal desktop agent development. From high-performance data capture to model training and real-time evaluation, everything is designed for flexibility and performance.

Architecture Overview

OWA consists of three core components:

🌍 Environment (Env) - Asynchronous, event-driven interface for real-time agent interactions
📊 Data - High-performance recording, storage, and analysis of multimodal desktop data
🤖 Examples - Complete implementations and training pipelines for multimodal agents

Quick Navigation

🌍 Environment Framework

Build reactive desktop agents with our asynchronous environment system.

Component Description
Core Concepts Callables, Listeners, and Runnables for real-time processing
Environment Guide Dynamic plugin activation and registry patterns
Custom Plugins Create your own environment extensions

Built-in Plugins:

📊 Data Infrastructure

Capture, store, and analyze multimodal desktop interaction data.

Component Description
Data Overview Complete data pipeline for desktop agents
OWAMcap Format Self-contained multimodal data format powered by mcap
Desktop Recorder (ocap) High-performance desktop recording tool
Data Viewer Visualize and analyze recorded sessions
Data Explorer Tools for data exploration and editing

🤖 Agent Examples

Learn from complete implementations and training pipelines.

Example Description Status
Multimodal Game Agent Vision-based game playing agent 🚧 In Progress
GUI Agent General desktop application automation 🚧 In Progress
Interactive World Model Predictive modeling of desktop environments 🚧 In Progress
Usage with LLMs Integration with large language models 🚧 In Progress
Usage with Transformers Vision transformer implementations 🚧 In Progress

Community & Ecosystem

🌱 Growing Ecosystem: OWA is designed for extensibility. Community contributions include:

  • Custom environment plugins (owa.env.minecraft, owa.env.web, etc.)
  • Specialized data processors and analyzers
  • Novel agent architectures and training methods

🤗 HuggingFace Integration: Upload and share datasets created with ocap. Preview datasets at HuggingFace Spaces.

Development Resources

Resource Description
Installation Guide Detailed installation instructions
Contributing Guide Development setup, bug reports, feature proposals
FAQ Common questions and troubleshooting

What Can You Build?

Anything that runs on desktop. If a human can do it on a computer, you can build an AI agent to automate it:

🤖 Desktop Automation - Navigate applications, automate workflows, interact with any software
🎮 Game AI - Master complex games through visual understanding and real-time decision making
📊 Training Datasets - Capture high-quality human-computer interaction data for foundation models
📈 Benchmarks - Create and evaluate desktop agent performance across diverse tasks

License

This project is released under the MIT License. See the LICENSE file for details.