Skip to content

Welcome to Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Streamline your agent's lifecycle with Open World Agents. From data capture to model training and real-time evaluation, everything is designed for flexibility and performance.

Here's what we've got in store for you!


  • OWA's Env: The asynchronous, event-driven environmental interface for real-time agent

    • Asynchronous, real-time event processing: Compared to existing LLM-agent frameworks and gymnasium.Env, OWA's Env features an asynchronous processing design leveraging Callables, Listeners, and Runnables. Learn more...
    • Dynamic EnvPlugin Activation: Seamlessly register and activate EnvPlugins at runtime to customize and extend functionality, powered by registry pattern. Learn more...
    • Extensible, Open-Source Design: Built for the community, by the community. Easily add custom plugins and extend the Env's functionality to suit your needs. Learn more...
  • Predefined EnvPlugins: We provide you some EnvPlugins which is suitable for constructing multimodal desktop agent.

    • owa-env-desktop: Provides basic Callables/Listeners for mouse/keyboard/window events.
    • owa-env-gst: Powered by Windows APIs (DXGI/WGC) and the robust GStreamer framework, provides high-performance and efficient screen capture/recording features. owa-env-gst's screen capture is 6x faster compared to alternatives. Learn more...

  • OWA's Data: From high-performance, robust and open-source friendly data format to powerful, efficient and huggingface integration.
    • OWAMcap file format: high-performance, self-contained, flexible container file format for multimodal desktop log data, powered by the open-source container file format mcap. Learn more...
    • owl mcap record your-filename.mcap: powerful, efficient and easy-to-use desktop recorder. Contains keyboard/mouse and high-frequency screen data.
    • 🤗 Hugging Face Integration: Upload your own dataset created by simple owl mcap record to huggingface and share with everyone! The era of open-source desktop data is near and effortless. Preview the dataset at Hugging Face Spaces.

  • Comprehensive Examples: We provides various examples that demonstrates how to build foundation multimodal desktop agent. Since it's just a example, you may customize anything you want. Examples are in progress; stay tuned!

Quick Start

  • Simple example of using Callables and Listeners. Learn more...

    import time
    
    from owa.core.registry import CALLABLES, LISTENERS, activate_module
    
    # Activate the standard environment module
    activate_module("owa.env.std")
    
    def callback():
        # Get current time in nanoseconds
        time_ns = CALLABLES["clock.time_ns"]()
        print(f"Current time in nanoseconds: {time_ns}")
    
    # Create a listener for clock/tick event, Set listener to trigger every 1 second
    tick = LISTENERS["clock/tick"]().configure(callback=callback, interval=1)
    
    # Start the listener
    tick.start()
    
    # Allow the listener to run for 2 seconds
    time.sleep(2)
    
    # Stop the listener and wait for it to finish
    tick.stop(), tick.join()
    

  • Record your own desktop usage data by just running owl mcap record your-filename.mcap. Learn more...

  • Curious about OWAMCap format? see following: (Note that cat output is a created example.)

    $ owl mcap info example.mcap
    library:   mcap-owa-support 0.1.0; mcap 1.2.2
    profile:   owa
    messages:  2124
    duration:  17.6543448s
    start:     2025-03-11T02:46:39.0329786+09:00 (1741628799.032978600)
    end:       2025-03-11T02:46:56.6873234+09:00 (1741628816.687323400)
    compression:
            zstd: [1/1 chunks] [173.83 KiB/28.29 KiB (83.73%)] [1.60 KiB/sec]
    channels:
            (1) window            18 msgs (1.02 Hz)    : owa.env.desktop.msg.WindowInfo [jsonschema]
            (2) keyboard/state    18 msgs (1.02 Hz)    : owa.env.desktop.msg.KeyboardState [jsonschema]
            (3) mouse           1064 msgs (60.27 Hz)   : owa.env.desktop.msg.MouseEvent [jsonschema]
            (4) screen           978 msgs (55.40 Hz)   : owa.env.gst.msg.ScreenEmitted [jsonschema]
            (5) keyboard          46 msgs (2.61 Hz)    : owa.env.desktop.msg.KeyboardEvent [jsonschema]
    channels: 5
    attachments: 0
    metadata: 0
    
    $ owl mcap cat example.mcap --n 8 --no-pretty
    Topic: window, Timestamp: 1741628814049712700, Message: {'title': 'ZType – Typing Game - Type to Shoot - Chromium', 'rect': [389, 10, 955, 1022], 'hWnd': 7540094}
    Topic: keyboard/state, Timestamp: 1741628814049712700, Message: {'buttons': []}
    Topic: screen, Timestamp: 1741628814057575300, Message: {'path': 'example.mkv', 'pts': 14866666666,
    
    ... (additional lines omitted for brevity) ...
    
    Topic: keyboard, Timestamp: 1741628814978561600, Message: {'event_type': 'press', 'vk': 162}
    Topic: keyboard, Timestamp: 1741628815015522100, Message: {'event_type': 'release', 'vk': 162}
    Topic: window, Timestamp: 1741628815050666400, Message: {'title': 'data_format.md - open-world-agents - Visual Studio Code', 'rect': [-8, -8, 1928, 1040], 'hWnd': 133438}
    
    ... (additional lines omitted for brevity) ...
    
    Topic: mouse, Timestamp: 1741628816438561600, Message: {'event_type': 'move', 'x': 950, 'y': 891}
    Topic: mouse, Timestamp: 1741628816441655400, Message: {'event_type': 'click', 'x': 950, 'y': 891, 'button': 'left', 'pressed': true}
    

Contributing

We welcome contributions! Please see our Contributing Guide for details on how to:

  • Set up your development environment.
  • Submit bug reports.
  • Propose new features.
  • Create pull requests.

License

This project is released under the MIT License. See the LICENSE file for details.