Desktop Environment¶
Mouse, keyboard, window control, and screen capture for desktop automation.
Components¶
Category | Component | Type | Description |
---|---|---|---|
Mouse | desktop/mouse.click |
Callable | Simulate mouse clicks |
desktop/mouse.move |
Callable | Move cursor to coordinates | |
desktop/mouse.position |
Callable | Get current mouse position | |
desktop/mouse.press |
Callable | Press mouse button | |
desktop/mouse.release |
Callable | Release mouse button | |
desktop/mouse.scroll |
Callable | Simulate mouse wheel scrolling | |
desktop/mouse.get_state |
Callable | Get current mouse position and buttons | |
desktop/mouse.get_pointer_ballistics_config |
Callable | Get Windows pointer ballistics settings | |
desktop/mouse |
Listener | Monitor mouse events | |
desktop/mouse_state |
Listener | Monitor mouse state changes | |
desktop/raw_mouse |
Listener | Raw mouse input (bypasses acceleration) | |
Keyboard | desktop/keyboard.press |
Callable | Press/release keys |
desktop/keyboard.type |
Callable | Type text strings | |
desktop/keyboard.press_repeat |
Callable | Simulate key auto-repeat | |
desktop/keyboard.get_keyboard_repeat_timing |
Callable | Get Windows keyboard repeat timing | |
desktop/keyboard |
Listener | Monitor keyboard events | |
desktop/keyboard_state |
Listener | Monitor keyboard state changes | |
Screen | desktop/screen.capture |
Callable | Capture screen (basic) |
Window | desktop/window.get_active_window |
Callable | Get active window info |
desktop/window.get_window_by_title |
Callable | Find window by title | |
desktop/window.get_pid_by_title |
Callable | Get process ID by window title | |
desktop/window.when_active |
Callable | Wait until window becomes active | |
desktop/window.is_active |
Callable | Check if window is active | |
desktop/window.make_active |
Callable | Activate/focus window | |
desktop/window |
Listener | Monitor window events |
Performance Note
For high-performance screen capture, use GStreamer Environment instead (6x faster).
Usage Examples¶
from owa.core import LISTENERS
from owa.msgs.desktop.keyboard import KeyboardEvent
def on_key(event: KeyboardEvent):
print(f"Key {event.event_type}: {event.vk}")
def on_mouse(event):
print(f"Mouse: {event.event_type} at {event.x}, {event.y}")
# Monitor events
with LISTENERS["desktop/keyboard"]().configure(callback=on_key).session:
with LISTENERS["desktop/mouse"]().configure(callback=on_mouse).session:
input("Press Enter to stop monitoring...")
Technical Details¶
Library Selection Rationale¶
This module utilizes pynput
for input simulation after evaluating several alternatives:
-
Why not PyAutoGUI? Though widely used, PyAutoGUI uses deprecated Windows APIs (
keybd_event/mouse_event
) rather than the modernSendInput
method. These older APIs fail in DirectX applications and games. Additionally, PyAutoGUI has seen limited maintenance (last significant update was over 2 years ago). -
Alternative Solutions: Libraries like pydirectinput and pydirectinput_rgx address the Windows API issue by using
SendInput
, but they lack input capturing capabilities which are essential for our use case. -
Other Options: We also evaluated keyboard and mouse libraries but found them inadequately maintained with several unresolved bugs that could impact reliability.
Raw Mouse Input¶
Raw mouse input capture is available to separate mouse position movement from game's center-locking and from user interactions. This enables access to unfiltered mouse movement data directly from the hardware, bypassing Windows pointer acceleration and game cursor manipulation.
Key Auto-Repeat Functionality¶
Key auto-repeat is a Windows feature where holding down a key generates multiple key events after an initial delay. When a user presses and holds a key, Windows first waits for the repeat delay period, then generates repeated WM_KEYDOWN
messages at intervals determined by the repeat rate.
How Windows Auto-Repeat Works¶
- Initial Key Press: First
WM_KEYDOWN
message is sent immediately with repeat count = 1 - Repeat Delay: System waits for the configured delay (typically 250-1000ms)
- Repeated Events: Additional
WM_KEYDOWN
messages are sent at the repeat rate interval (typically 30ms) - Repeat Count: Each repeated message includes an incremented repeat count in the message parameters
System Configuration: Windows allows users to configure auto-repeat behavior through: - Repeat Delay: Time before auto-repeat begins (0-3 scale, maps to 250ms-1000ms, default: 500ms) - Repeat Rate: Frequency of repeated characters (0-31 scale, maps to ~30ms-500ms intervals, default: 30ms)
These settings can be accessed programmatically via SystemParametersInfo
with SPI_GETKEYBOARDDELAY
and SPI_GETKEYBOARDSPEED
parameters.
References: - Keyboard Repeat Delay and Repeat Rate - Microsoft documentation on keyboard repeat behavior - SystemParametersInfo Function - Windows API for keyboard repeat parameters
Using OWA's press_repeat Function¶
For simulating key auto-repeat behavior, use the dedicated function:
CALLABLES["desktop/keyboard.press_repeat"](key, press_time: float, initial_delay: float = 0.5, repeat_delay: float = 0.033)
Parameters:
- key
: The key to press and repeat
- press_time
: Total duration to hold the key (seconds)
- initial_delay
: Time before repeating starts (default: 0.5s, matches Windows default)
- repeat_delay
: Interval between repeated keypresses (default: 0.033s ≈ 30ms, matches Windows default)
Differences from True Windows Auto-Repeat¶
The press_repeat
function approximates Windows auto-repeat behavior but isn't identical:
OS Auto-Repeat vs OWA Implementation:
- OS Auto-Repeat: WM_KEYDOWN
messages include repeat flag (bit 30) and repeat count
- OWA Implementation: Multiple WM_KEYDOWN
messages without repeat flags (each appears as individual key press)
The difference is small and commonly ignored by applications, making this approach effective for most automation scenarios.
Why the difference exists: Windows provides repeat detection through WM_KEYDOWN
message parameters, but pynput does not expose these Windows-specific details. Since the primary use case is triggering repeat behavior rather than detecting it, this limitation doesn't affect the functionality.
Reference: WM_KEYDOWN Message - Official Windows documentation for key press events and message parameters
Technical Details: Windows Repeat Count Behavior¶
The WM_KEYDOWN
repeat count (bits 0-15) behaves differently than many developers expect:
- Not cumulative: Each message contains the repeat count since the last processed
WM_KEYDOWN
, not a running total - Usually 1: In typical applications with fast message processing, the repeat count is almost always 1
- Higher values possible: Only occurs when message processing is slow enough for multiple repeats to queue up
Example: If you hold a key and your message loop processes messages quickly, you'll receive multiple WM_KEYDOWN
messages each with repeat count = 1. Only when processing is delayed (e.g., by adding Sleep(1000)
in the handler) will you see higher repeat counts like 20-30.
This design allows responsive applications to process key events immediately rather than waiting for the key release.
Reference: WM_KEYDOWN repeat count behavior explained - Stack Overflow discussion with practical examples
Implementation
See owa-env-desktop source for detailed implementation.
API Reference¶
desktop plugin 0.5.7 ¶
Desktop environment plugin with mouse, keyboard, and window control
Author: OWA Development Team
Callables ¶
Usage: To use callable components, import CALLABLES
from owa.core
and access them by their component name:
from owa.core import CALLABLES
# Access a callable component (replace 'component_name' with actual name)
callable_func = CALLABLES["desktop/component_name"]
result = callable_func(your_arguments)
screen.capture ¶
Capture the current screen as a numpy array.
Returns:
Type | Description |
---|---|
ndarray
|
numpy.ndarray: Screen capture as BGR image array with shape (height, width, 3). |
Examples:
>>> screen = capture_screen()
>>> print(f"Screen dimensions: {screen.shape}") # e.g., (1080, 1920, 3)
>>> # Save to file: cv2.imwrite('screenshot.png', screen)
Source code in projects/owa-env-desktop/owa/env/desktop/screen/callables.py
mouse.click ¶
Simulate a mouse click.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to click. Can be "left", "middle", "right" or a Button enum. |
required |
count
|
int
|
Number of clicks to perform. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.move ¶
mouse.position ¶
mouse.press ¶
mouse_press(button: str | Button) -> None
Press and hold a mouse button.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to press. Can be "left", "middle", "right" or a Button enum. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.release ¶
mouse_release(button: str | Button) -> None
Release a previously pressed mouse button.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
button
|
str | Button
|
Mouse button to release. Can be "left", "middle", "right" or a Button enum. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.scroll ¶
Simulate mouse wheel scrolling.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
x
|
int
|
X coordinate where scrolling occurs. |
required |
y
|
int
|
Y coordinate where scrolling occurs. |
required |
dx
|
int
|
Horizontal scroll amount. |
required |
dy
|
int
|
Vertical scroll amount. |
required |
Examples:
>>> mouse_scroll(100, 100, 0, 3) # Scroll up 3 units at position (100, 100)
>>> mouse_scroll(100, 100, 0, -3) # Scroll down 3 units
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.get_state ¶
Get the current mouse state including position and pressed buttons.
Returns:
Type | Description |
---|---|
MouseState
|
MouseState object containing current mouse position and pressed buttons. |
Examples:
>>> state = get_mouse_state()
>>> print(f"Mouse at ({state.x}, {state.y}), buttons: {state.buttons}")
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
mouse.get_pointer_ballistics_config ¶
Get Windows pointer ballistics configuration for WM_MOUSEMOVE reconstruction.
Examples:
Check whether Enhance pointer precision is enabled¶
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.press ¶
Press and hold a keyboard key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to press. Can be a string (e.g., 'a', 'enter') or virtual key code. |
required |
Examples:
>>> press('a') # Press and hold the 'a' key
>>> press(65) # Press and hold the 'a' key using virtual key code
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.release ¶
Release a previously pressed keyboard key.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to release. Can be a string (e.g., 'a', 'enter') or virtual key code. |
required |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.type ¶
keyboard_type(text: str) -> None
Type a string of characters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
Text string to type. |
required |
Examples:
>>> keyboard_type("Hello, World!") # Types the text
>>> keyboard_type("user@example.com") # Types an email address
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.get_state ¶
Get the current keyboard state including pressed keys.
Returns:
Type | Description |
---|---|
KeyboardState
|
KeyboardState object containing currently pressed keys. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.press_repeat ¶
press_repeat_key(
key: str | int,
press_time: float,
initial_delay: float = 0.5,
repeat_delay: float = 0.033,
) -> None
Simulate the behavior of holding a key down with auto-repeat.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
key
|
str | int
|
Key to press repeatedly. Can be a string or virtual key code. |
required |
press_time
|
float
|
Total time to hold the key down in seconds. |
required |
initial_delay
|
float
|
Initial delay before auto-repeat starts (default: 0.5s). |
0.5
|
repeat_delay
|
float
|
Delay between repeated key presses (default: 0.033s). |
0.033
|
Examples:
>>> press_repeat_key('a', 2.0) # Hold 'a' key for 2 seconds with auto-repeat
>>> press_repeat_key('space', 1.5, 0.3, 0.05) # Custom timing for space key
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.release_all_keys ¶
Release all currently pressed keys on the keyboard.
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
keyboard.get_keyboard_repeat_timing ¶
Get Windows keyboard repeat delay and repeat rate settings.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
return_seconds
|
bool
|
If True (default), return timing values in seconds. If False, return raw Windows API values. |
True
|
Returns:
Type | Description |
---|---|
Dict[str, float] | Dict[str, int]
|
When return_seconds=True: Dict[str, float]: Dictionary with timing in seconds - keyboard_delay_seconds: Initial delay before auto-repeat starts - keyboard_rate_seconds: Interval between repeated keystrokes |
Dict[str, float] | Dict[str, int]
|
When return_seconds=False: Dict[str, int]: Dictionary with raw Windows API values - keyboard_delay: Raw delay value (0-3 scale) - keyboard_speed: Raw speed value (0-31 scale) |
Raises:
Type | Description |
---|---|
OSError
|
If not running on Windows platform |
RuntimeError
|
If Windows API call fails |
Examples:
>>> # Get timing in seconds (default)
>>> timing = get_keyboard_repeat_timing()
>>> print(f"Delay: {timing['keyboard_delay_seconds']:.3f}s, Rate: {timing['keyboard_rate_seconds']:.3f}s")
>>> # Get raw Windows API values
>>> raw_timing = get_keyboard_repeat_timing(return_seconds=False)
>>> print(f"Raw delay: {raw_timing['keyboard_delay']}, Raw speed: {raw_timing['keyboard_speed']}")
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/callables.py
window.get_active_window ¶
Get information about the currently active window.
Returns:
Type | Description |
---|---|
WindowInfo | None
|
WindowInfo object containing title, position, and handle of the active window, |
WindowInfo | None
|
or None if no active window is found. |
Examples:
>>> window = get_active_window()
>>> if window:
... print(f"Active window: {window.title}")
... print(f"Position: {window.rect}")
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.get_window_by_title ¶
get_window_by_title(
window_title_substring: str,
) -> WindowInfo
Find a window by searching for a substring in its title.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
WindowInfo
|
WindowInfo object for the first matching window. |
Raises:
Type | Description |
---|---|
ValueError
|
If no window with matching title is found. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.get_pid_by_title ¶
Get the process ID (PID) of a window by its title.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
int
|
Process ID of the window. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.when_active ¶
Decorator to run a function only when a specific window is active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
Callable
|
Decorator function that conditionally executes the wrapped function. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.is_active ¶
Check if a window with the specified title substring is currently active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Returns:
Type | Description |
---|---|
bool
|
True if the window is active, False otherwise. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
window.make_active ¶
make_active(window_title_substring: str) -> None
Bring a window to the foreground and make it active.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
window_title_substring
|
str
|
Substring to search for in window titles. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If no window with matching title is found. |
NotImplementedError
|
If the operation is not supported on the current OS. |
Examples:
Source code in projects/owa-env-desktop/owa/env/desktop/window/callables.py
Listeners ¶
Usage: To use listener components, import LISTENERS
from owa.core
and call the configure()
method with a callback
function:
from owa.core import LISTENERS
# Configure a listener component (replace 'component_name' with actual name)
listener = LISTENERS["desktop/component_name"]
listener.configure(callback=my_callback, your_other_arguments)
# Use the listener in a context manager
with listener.session as active_listener:
# The listener is now running and will call my_callback when events occur
pass # Your main code here
Note: The callback
argument is required. The on_configure()
method shown in the documentation is an internal method called by configure()
.
keyboard ¶
Bases: Listener
Keyboard event listener that captures key press and release events.
This listener wraps pynput's KeyboardListener to provide keyboard event monitoring with OWA's listener interface.
Examples:
>>> def on_key_event(event):
... print(f"Key {event.vk} was {event.event_type}")
>>> listener = KeyboardListenerWrapper().configure(callback=on_key_event)
>>> listener.start()
mouse ¶
Bases: Listener
Mouse event listener that captures mouse movement, clicks, and scroll events.
This listener wraps pynput's MouseListener to provide mouse event monitoring with OWA's listener interface.
Examples:
>>> def on_mouse_event(event):
... print(f"Mouse {event.event_type} at ({event.x}, {event.y})")
>>> listener = MouseListenerWrapper().configure(callback=on_mouse_event)
>>> listener.start()
raw_mouse ¶
Bases: Listener
Raw mouse input listener using Windows WM_INPUT messages.
This listener captures high-definition mouse movement data directly from the HID stack, bypassing Windows pointer acceleration and screen resolution limits. Provides sub-pixel precision and unfiltered input data essential for gaming and precision applications.
Examples:
>>> def on_raw_mouse_event(event):
... print(f"Raw mouse: dx={event.dx}, dy={event.dy}, flags={event.button_flags}")
>>> listener = RawMouseListener().configure(callback=on_raw_mouse_event)
>>> listener.start()
on_configure ¶
Initialize the raw input capture system.
loop ¶
Start the raw input capture loop.
Source code in projects/owa-env-desktop/owa/env/desktop/keyboard_mouse/listeners.py
keyboard_state ¶
Bases: Listener
Periodically reports the current keyboard state.
This listener calls the callback function every second with the current keyboard state, including which keys are currently pressed.
Examples:
>>> def on_keyboard_state(state):
... if state.buttons:
... print(f"Keys pressed: {state.buttons}")
>>> listener = KeyboardStateListener().configure(callback=on_keyboard_state)
>>> listener.start()
mouse_state ¶
Bases: Listener
Periodically reports the current mouse state.
This listener calls the callback function every second with the current mouse state, including position and pressed buttons.
Examples:
>>> def on_mouse_state(state):
... print(f"Mouse at ({state.x}, {state.y}), buttons: {state.buttons}")
>>> listener = MouseStateListener().configure(callback=on_mouse_state)
>>> listener.start()
window ¶
Bases: Listener
Periodically monitors and reports the currently active window.
This listener calls the callback function every second with information about the currently active window, including title, position, and handle.
Examples:
Monitor active window changes:
>>> def on_window_change(window):
... if window:
... print(f"Active window: {window.title}")
>>>
>>> listener = WindowListener().configure(callback=on_window_change)
>>> listener.start()
>>> # ... listener runs in background ...
>>> listener.stop()
>>> listener.join()
Track window focus for automation:
>>> def track_focus(window):
... if window and "notepad" in window.title.lower():
... print("Notepad is now active!")
>>>
>>> listener = WindowListener().configure(callback=track_focus)
>>> listener.start()