Skip to content

Quick Start Guide

This guide covers the OWA workflow: Record → Process → Train

Training Pipeline Coming Soon

We developed a complete training pipeline during our D2E research. We're currently preparing it for open-source release—stay tuned!

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./

# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset

Prerequisites

Before starting, install OWA. See the Installation Guide for details.

$ conda install open-world-agents::gstreamer-bundle
$ pip install owa
$ pip install owa

Step 1: Record Desktop Interaction

ocap records your desktop in one command:

$ ocap my-session.mcap

This captures screen video (H.265), keyboard/mouse events, window context, and audio—all synchronized with nanosecond precision. See ocap documentation for options.

Here's a demo of ocap in action:

Step 2: Process to Training Format

Transform recorded data into training-ready datasets:

Data Pipeline

See owa-data for full pipeline documentation.

Step 3: Train Your Model

Training Pipeline Coming Soon

We developed a complete training pipeline during our D2E research. We're currently preparing it for open-source release—stay tuned!

Environment Framework

For live agent interactions (not just recording), use OWA's environment framework:

from owa.core import CALLABLES

screen = CALLABLES["desktop/screen.capture"]()
from owa.core import CALLABLES

CALLABLES["desktop/mouse.click"]("left", 2)  # Double-click
CALLABLES["desktop/keyboard.type"]("Hello World!")
from owa.core import LISTENERS

def on_key(event):
    print(f"Key pressed: {event.vk}")

listener = LISTENERS["desktop/keyboard"]().configure(callback=on_key)

See Environment Guide for the full API.

Next Steps

Goal Resource
Browse community data 🤗 HuggingFace Datasets
Visualize recordings Dataset Visualizer
Build agents Agent Examples
Extend OWA Custom Plugins
Get help FAQ · Contributing