Desktop Environment¶
The Desktop Environment module (owa.env.desktop) extends Open World Agents by providing functionalities that interact with the operating system's desktop. It focuses on user interface interactions and input simulation.
Features¶
- Screen Capture: Capture the current screen using CALLABLES["desktop/screen.capture"].
- Window Management: Retrieve information about active windows and search for windows by title using functions like CALLABLES["desktop/window.get_active_window"] and CALLABLES["desktop/window.get_window_by_title"].
- Input Simulation: Simulate mouse actions (e.g., CALLABLES["desktop/mouse.click"]) and set up keyboard listeners to handle input events.
Usage¶
The Desktop Environment module is automatically available when you install owa-env-desktop
. No manual activation needed!
# Components automatically available after installation
from owa.core.registry import CALLABLES, LISTENERS
You can access desktop functionalities via the global registries using the unified namespace/name
pattern:
print(CALLABLES["desktop/screen.capture"]().shape) # Capture and display screen dimensions
print(CALLABLES["desktop/window.get_active_window"]()) # Retrieve the active window
This module is essential for applications that require integration with desktop UI elements and user input simulation.
Implementation Details¶
To see detailed implementation, skim over owa-env-desktop. API documentation is currently being developed.
Available Functions¶
Mouse Functions¶
desktop/mouse.click
- Simulate a mouse clickdesktop/mouse.move
- Move the mouse cursor to specified coordinatesdesktop/mouse.position
- Get the current mouse positiondesktop/mouse.press
- Simulate pressing a mouse buttondesktop/mouse.release
- Simulate releasing a mouse buttondesktop/mouse.scroll
- Simulate mouse wheel scrolling
Keyboard Functions¶
desktop/keyboard.press
- Simulate pressing a keyboard keydesktop/keyboard.release
- Simulate releasing a keyboard keydesktop/keyboard.type
- Type a string of charactersdesktop/keyboard.press_repeat
- Simulate repeat-press when pressing key long time
Screen Functions¶
desktop/screen.capture
- Capture the current screen (Note: This module utilizesbettercam
. For better performance and extensibility, useowa-env-gst
's functions instead)
Window Functions¶
desktop/window.get_active_window
- Get the currently active windowdesktop/window.get_window_by_title
- Find a window by its titledesktop/window.when_active
- Run a function when a specific window becomes active
Available Listeners¶
desktop/keyboard
- Listen for keyboard eventsdesktop/mouse
- Listen for mouse events
Misc¶
Library Selection Rationale¶
This module utilizes pynput
for input simulation after evaluating several alternatives:
-
Why not PyAutoGUI? Though widely used, PyAutoGUI uses deprecated Windows APIs (
keybd_event/mouse_event
) rather than the modernSendInput
method. These older APIs fail in DirectX applications and games. Additionally, PyAutoGUI has seen limited maintenance (last significant update was over 2 years ago). -
Alternative Solutions: Libraries like pydirectinput and pydirectinput_rgx address the Windows API issue by using
SendInput
, but they lack input capturing capabilities which are essential for our use case. -
Other Options: We also evaluated keyboard and mouse libraries but found them inadequately maintained with several unresolved bugs that could impact reliability.
Input Auto-Repeat Functionality¶
For simulating key auto-repeat behavior, use the dedicated function:
CALLABLES["desktop/keyboard.press_repeat"](key, press_time: float, initial_delay: float = 0.5, repeat_delay: float = 0.033)
This function handles the complexity of simulating hardware auto-repeat, with configurable initial delay before repeating starts and the interval between repeated keypresses.
Auto-generated documentation¶
desktop plugin 0.3.9.post1 ¶
Desktop environment plugin with mouse, keyboard, and window control
Author: OWA Development Team
Callables ¶
Usage: To use callable components, import CALLABLES
from owa.core
and access them by their component name:
from owa.core import CALLABLES
# Access a callable component (replace 'component_name' with actual name)
callable_func = CALLABLES["desktop/component_name"]
result = callable_func(your_arguments)
screen.capture ¶
Capture the current screen as a numpy array.
Returns:
Type | Description |
---|---|
ndarray
|
numpy.ndarray: Screen capture as BGR image array with shape (height, width, 3). |
Examples:
>>> screen = capture_screen()
>>> print(f"Screen dimensions: {screen.shape}") # e.g., (1080, 1920, 3)
>>> # Save to file: cv2.imwrite('screenshot.png', screen)
Source code in projects/owa-env-desktop/owa/env/desktop/screen/callables.py
mouse.click ¶
Simulate a mouse click.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to click. Can be "left", "middle", "right" or a Button enum. |
required |
count
|
int
|
Number of clicks to perform. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.move ¶
mouse.position ¶
mouse.press ¶
mouse_press(button: str | Button) -> None
Press and hold a mouse button.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to press. Can be "left", "middle", "right" or a Button enum. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.release ¶
mouse_release(button: str | Button) -> None
Release a previously pressed mouse button.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to release. Can be "left", "middle", "right" or a Button enum. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.scroll ¶
Simulate mouse wheel scrolling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
int
|
X coordinate where scrolling occurs. |
required |
y
|
int
|
Y coordinate where scrolling occurs. |
required |
dx
|
int
|
Horizontal scroll amount. |
required |
dy
|
int
|
Vertical scroll amount. |
required |
Examples:
>>> mouse_scroll(100, 100, 0, 3) # Scroll up 3 units at position (100, 100)
>>> mouse_scroll(100, 100, 0, -3) # Scroll down 3 units
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.get_state ¶
Get the current mouse state including position and pressed buttons.
Returns:
Type | Description |
---|---|
MouseState
|
MouseState object containing current mouse position and pressed buttons. |
Examples:
>>> state = get_mouse_state()
>>> print(f"Mouse at ({state.x}, {state.y}), buttons: {state.buttons}")
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.press ¶
Press and hold a keyboard key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to press. Can be a string (e.g., 'a', 'enter') or virtual key code. |
required |
Examples:
>>> press('a') # Press and hold the 'a' key
>>> press(65) # Press and hold the 'a' key using virtual key code
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.release ¶
Release a previously pressed keyboard key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to release. Can be a string (e.g., 'a', 'enter') or virtual key code. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.type ¶
keyboard_type(text: str) -> None
Type a string of characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
Text string to type. |
required |
Examples:
>>> keyboard_type("Hello, World!") # Types the text
>>> keyboard_type("user@example.com") # Types an email address
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.get_state ¶
Get the current keyboard state including pressed keys.
Returns:
Type | Description |
---|---|
KeyboardState
|
KeyboardState object containing currently pressed keys. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.press_repeat ¶
press_repeat_key(
key: str | int,
press_time: float,
initial_delay: float = 0.5,
repeat_delay: float = 0.033,
) -> None
Simulate the behavior of holding a key down with auto-repeat.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to press repeatedly. Can be a string or virtual key code. |
required |
press_time
|
float
|
Total time to hold the key down in seconds. |
required |
initial_delay
|
float
|
Initial delay before auto-repeat starts (default: 0.5s). |
0.5
|
repeat_delay
|
float
|
Delay between repeated key presses (default: 0.033s). |
0.033
|
Examples:
>>> press_repeat_key('a', 2.0) # Hold 'a' key for 2 seconds with auto-repeat
>>> press_repeat_key('space', 1.5, 0.3, 0.05) # Custom timing for space key
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.release_all_keys ¶
Release all currently pressed keys on the keyboard.
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
window.get_active_window ¶
Get information about the currently active window.
Returns:
Type | Description |
---|---|
WindowInfo | None
|
WindowInfo object containing title, position, and handle of the active window, |
WindowInfo | None
|
or None if no active window is found. |
Examples:
>>> window = get_active_window()
>>> if window:
... print(f"Active window: {window.title}")
... print(f"Position: {window.rect}")
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.get_window_by_title ¶
get_window_by_title(
window_title_substring: str,
) -> WindowInfo
Find a window by searching for a substring in its title.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
WindowInfo
|
WindowInfo object for the first matching window. |
Raises:
Type | Description |
---|---|
ValueError
|
If no window with matching title is found. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.get_pid_by_title ¶
Get the process ID (PID) of a window by its title.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
int
|
Process ID of the window. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.when_active ¶
Decorator to run a function only when a specific window is active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
Callable
|
Decorator function that conditionally executes the wrapped function. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.is_active ¶
Check if a window with the specified title substring is currently active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the window is active, False otherwise. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.make_active ¶
make_active(window_title_substring: str) -> None
Bring a window to the foreground and make it active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If no window with matching title is found. |
NotImplementedError
|
If the operation is not supported on the current OS. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
Listeners ¶
Usage: To use listener components, import LISTENERS
from owa.core
and call the configure()
method with a callback
function:
from owa.core import LISTENERS
# Configure a listener component (replace 'component_name' with actual name)
listener = LISTENERS["desktop/component_name"]
listener.configure(callback=my_callback, your_other_arguments)
# Use the listener in a context manager
with listener.session as active_listener:
# The listener is now running and will call my_callback when events occur
pass # Your main code here
Note: The callback
argument is required. The on_configure()
method shown in the documentation is an internal method called by configure()
.
keyboard ¶
Bases: Listener
Keyboard event listener that captures key press and release events.
This listener wraps pynput's KeyboardListener to provide keyboard event monitoring with OWA's listener interface.
Examples:
>>> def on_key_event(event):
... print(f"Key {event.vk} was {event.event_type}")
>>> listener = KeyboardListenerWrapper().configure(callback=on_key_event)
>>> listener.start()
mouse ¶
Bases: Listener
Mouse event listener that captures mouse movement, clicks, and scroll events.
This listener wraps pynput's MouseListener to provide mouse event monitoring with OWA's listener interface.
Examples:
>>> def on_mouse_event(event):
... print(f"Mouse {event.event_type} at ({event.x}, {event.y})")
>>> listener = MouseListenerWrapper().configure(callback=on_mouse_event)
>>> listener.start()
keyboard_state ¶
Bases: Listener
Periodically reports the current keyboard state.
This listener calls the callback function every second with the current keyboard state, including which keys are currently pressed.
Examples:
>>> def on_keyboard_state(state):
... if state.buttons:
... print(f"Keys pressed: {state.buttons}")
>>> listener = KeyboardStateListener().configure(callback=on_keyboard_state)
>>> listener.start()
mouse_state ¶
Bases: Listener
Periodically reports the current mouse state.
This listener calls the callback function every second with the current mouse state, including position and pressed buttons.
Examples:
>>> def on_mouse_state(state):
... print(f"Mouse at ({state.x}, {state.y}), buttons: {state.buttons}")
>>> listener = MouseStateListener().configure(callback=on_mouse_state)
>>> listener.start()
window ¶
Bases: Listener
Periodically monitors and reports the currently active window.
This listener calls the callback function every second with information about the currently active window, including title, position, and handle.
Examples:
Monitor active window changes:
>>> def on_window_change(window):
... if window:
... print(f"Active window: {window.title}")
>>>
>>> listener = WindowListener().configure(callback=on_window_change)
>>> listener.start()
>>> # ... listener runs in background ...
>>> listener.stop()
>>> listener.join()
Track window focus for automation:
>>> def track_focus(window):
... if window and "notepad" in window.title.lower():
... print("Notepad is now active!")
>>>
>>> listener = WindowListener().configure(callback=track_focus)
>>> listener.start()