Quick Start GuideΒΆ
Complete step-by-step walkthrough for getting started with Open World Agents
3-Step Workflow
This guide covers the complete OWA workflow: Record β Process β Train
OverviewΒΆ
This guide provides detailed explanations, examples, and troubleshooting for the 3-step OWA workflow:
# 1. Record desktop interaction
$ ocap my-session.mcap
# 2. Process to training format
$ python scripts/01_raw_events_to_event_dataset.py --train-dir ./
# 3. Train your model
$ python train.py --dataset ./event-dataset
π Detailed Guide: Complete Quick Start Tutorial - Step-by-step walkthrough with examples and troubleshooting
PrerequisitesΒΆ
Installation Required
Before starting, ensure you have OWA installed. See the Installation Guide for detailed setup instructions.
Step 1: Record Desktop InteractionΒΆ
Record with ocap
Use ocap
(Omnimodal CAPture) to record your desktop interactions with synchronized video, audio, and input events.
What this captures
- Screen video with hardware acceleration
- Keyboard events with nanosecond precision
- Mouse interactions with exact coordinates
- Audio recording synchronized with video
- Everything saved in the OWAMcap format
Learn More
- Desktop Recording Guide - Complete setup and usage
- OWAMcap Format - Technical specification
- Recording Troubleshooting - Common issues and solutions
Step 2: Process to Training FormatΒΆ
Transform with Data Pipeline
Transform your recorded data into training-ready datasets using OWA's data pipeline.
Processing Pipeline
- Extracts events from the MCAP file
- Converts format to standardized training structure
- Handles media references and synchronization
- Prepares data for ML frameworks
Advanced Processing
flowchart LR
A[MCAP File] --> B[Event Dataset]
B --> C[Binned Dataset]
C --> D[Training Ready]
style A fill:#e1f5fe
style D fill:#e8f5e8
Learn More
- Data Pipeline Guide - Complete processing workflow
- Data Explorer - Analyze and visualize your data
- CLI Tools - Command-line utilities for data management
Step 3: Train Your ModelΒΆ
TODO: Training Implementation
This section is under development. Training scripts and detailed examples are coming soon.
Train with Processed Data
Use the processed dataset to train your desktop agent model.
Training Capabilities
- Multimodal models on desktop interactions
- Learn from demonstrations - human behavior patterns
- Application-specific agents - tailored for your use case
- Performance evaluation on real tasks
Training Architecture
flowchart TD
A[Event Dataset] --> B[Vision Encoder]
A --> C[Action Encoder]
B --> D[Multimodal Fusion]
C --> D
D --> E[Policy Network]
E --> F[Desktop Agent]
style A fill:#e1f5fe
style F fill:#e8f5e8
Learn More
- Agent Examples - Complete implementations and training pipelines
- Multimodal Game Agent - Vision-based game playing
- GUI Agent - General desktop automation
- Usage with LLMs - Integration patterns
Environment Framework IntegrationΒΆ
Real-time Agent Interactions
While recording and training, you can also use OWA's real-time environment framework for live agent interactions:
Learn More
- Environment Guide - Complete system overview
- Environment Framework - Core concepts and quick start
- Custom Plugins - Extend functionality
Community ResourcesΒΆ
Datasets & Tools
- Browse Community Datasets - Hundreds of OWAMcap datasets
- Dataset Visualizer - Interactive preview tool
- FAQ - Common questions and troubleshooting
- Contributing Guide - Development setup and contribution guidelines
- Help with OWA - Community support resources
Next StepsΒΆ
Your Journey Continues
-
Explore Examples: Start with Agent Examples to see complete implementations
-
Join the Community: Browse and contribute datasets
-
Build Custom Plugins: Extend OWA with custom environment plugins
-
Advanced Usage: Dive into technical documentation for advanced features
Quick Links
- Need help? β FAQ or Community Support
- Ready to build? β Agent Examples
- Want to contribute? β Contributing Guide