Quick Start Guide¶

Complete step-by-step walkthrough for getting started with Open World Agents

3-Step Workflow

This guide covers the complete OWA workflow: Record → Process → Train

Overview¶

This guide provides detailed explanations, examples, and troubleshooting for the 3-step OWA workflow:

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_events_to_event_dataset.py --train-dir ./

# 3. Train your model
$ python train.py --dataset ./event-dataset

📖 Detailed Guide: Complete Quick Start Tutorial - Step-by-step walkthrough with examples and troubleshooting

Prerequisites¶

Installation Required

Before starting, ensure you have OWA installed. See the Installation Guide for detailed setup instructions.

Video RecordingData Processing Only

For full recording capabilities:

# Install GStreamer dependencies first
conda install open-world-agents::gstreamer-bundle
pip install owa

For basic data processing:

pip install owa

Step 1: Record Desktop Interaction¶

Record with ocap

Use ocap (Omnimodal CAPture) to record your desktop interactions with synchronized video, audio, and input events.

$ ocap my-session.mcap

What this captures

Screen video with hardware acceleration
Keyboard events with nanosecond precision
Mouse interactions with exact coordinates
Audio recording synchronized with video
Everything saved in the OWAMcap format

Learn More

Desktop Recording Guide - Complete setup and usage
OWAMcap Format - Technical specification
Recording Troubleshooting - Common issues and solutions

Step 2: Process to Training Format¶

Transform with Data Pipeline

Transform your recorded data into training-ready datasets using OWA's data pipeline.

$ python scripts/01_raw_events_to_event_dataset.py --train-dir ./

Processing Pipeline

Extracts events from the MCAP file
Converts format to standardized training structure
Handles media references and synchronization
Prepares data for ML frameworks

Advanced Processing

flowchart LR
    A[MCAP File] --> B[Event Dataset]
    B --> C[Binned Dataset]
    C --> D[Training Ready]

    style A fill:#e1f5fe
    style D fill:#e8f5e8

Learn More

Data Pipeline Guide - Complete processing workflow
Data Explorer - Analyze and visualize your data
CLI Tools - Command-line utilities for data management

Step 3: Train Your Model¶

TODO: Training Implementation

This section is under development. Training scripts and detailed examples are coming soon.

Train with Processed Data

Use the processed dataset to train your desktop agent model.

$ python train.py --dataset ./event-dataset

Training Capabilities

Multimodal models on desktop interactions
Learn from demonstrations - human behavior patterns
Application-specific agents - tailored for your use case
Performance evaluation on real tasks

Training Architecture

flowchart TD
    A[Event Dataset] --> B[Vision Encoder]
    A --> C[Action Encoder]
    B --> D[Multimodal Fusion]
    C --> D
    D --> E[Policy Network]
    E --> F[Desktop Agent]

    style A fill:#e1f5fe
    style F fill:#e8f5e8

Learn More

Agent Examples - Complete implementations and training pipelines
Multimodal Game Agent - Vision-based game playing
GUI Agent - General desktop automation
Usage with LLMs - Integration patterns

Environment Framework Integration¶

Real-time Agent Interactions

While recording and training, you can also use OWA's real-time environment framework for live agent interactions:

Screen CaptureEvent MonitoringAgent Actions

from owa.core import CALLABLES

# Real-time screen capture
screen = CALLABLES["desktop/screen.capture"]()

from owa.core import LISTENERS

# Monitor user interactions
def on_key(event):
    print(f"Key pressed: {event.vk}")

listener = LISTENERS["desktop/keyboard"]().configure(callback=on_key)

from owa.core import CALLABLES

# Perform desktop actions
CALLABLES["desktop/mouse.click"]("left", 2)  # Double-click
CALLABLES["desktop/keyboard.type"]("Hello World!")

Learn More

Environment Guide - Complete system overview
Environment Framework - Core concepts and quick start
Custom Plugins - Extend functionality

Community Resources¶

Datasets & Tools

Community DatasetsGetting Help

Browse Community Datasets - Hundreds of OWAMcap datasets
Dataset Visualizer - Interactive preview tool

FAQ - Common questions and troubleshooting
Contributing Guide - Development setup and contribution guidelines
Help with OWA - Community support resources

Next Steps¶

Your Journey Continues

Explore Examples: Start with Agent Examples to see complete implementations
Join the Community: Browse and contribute datasets
Build Custom Plugins: Extend OWA with custom environment plugins
Advanced Usage: Dive into technical documentation for advanced features

Quick Links

Need help? → FAQ or Community Support
Ready to build? → Agent Examples
Want to contribute? → Contributing Guide