Skip to content
/ pagent Public

Page Agent - Control your browser from CLI and AI assistants. YAML-based UI testing with Playwright and MCP support.

Notifications You must be signed in to change notification settings

devload/pagent

Repository files navigation

PAGENT

Page AGENT - An agent that acts on real web pages. Control your browser from CLI and AI assistants with YAML-based UI testing (Playwright) and Chrome extension bridge for real browser automation.

License: MIT

Features

  • YAML DSL: Write test scenarios in simple, readable YAML
  • Playwright-powered: Fast, reliable cross-browser testing
  • Rich artifacts: Collect HTML, screenshots, console logs, network requests, HAR files, computed styles, and traces
  • Parallel execution: Run multiple scenarios concurrently
  • Retry support: Automatically retry failed tests
  • CI-friendly: JSON output and proper exit codes

Installation

# Install globally from npm
npm install -g @devload/pagent

# Or install locally in your project
npm install @devload/pagent

From Source

# Clone the repository
git clone https://github.com/devload/pagent.git
cd pagent

# Install dependencies and build
npm install
npm run build

Quick Start

# Initialize with example scenarios
pagent init

# Validate scenarios
pagent validate scenarios/*.yaml

# Run all scenarios
pagent run "scenarios/*.yaml"

# Run with visible browser
pagent run scenarios/smoke-example.yaml --headed

CLI Commands

pagent init

Initialize PAGENT in the current directory with example scenarios.

pagent init [options]

Options:
  --json     Output as JSON
  --force    Overwrite existing files

Creates:

  • scenarios/ directory with example YAML files
  • artifacts/ directory with .gitignore
  • Example scenarios: smoke-example.yaml, hackernews.yaml

pagent run <pattern>

Run YAML test scenarios.

pagent run <pattern> [options]

Arguments:
  pattern              Glob pattern for scenario files (e.g., "scenarios/*.yaml")

Options:
  --headed             Run in headed mode (visible browser)
  --workers <number>   Number of parallel workers (default: 1)
  --base-url <url>     Override base URL for all scenarios
  --timeout <ms>       Default timeout in milliseconds
  --retries <number>   Number of retries for failed tests (default: 0)
  --artifact-dir <dir> Directory for artifacts (default: ./artifacts)
  --json               Output as JSON only

Examples:

# Run single scenario
pagent run scenarios/login.yaml

# Run all scenarios in parallel
pagent run "scenarios/*.yaml" --workers 4

# Run with retries and custom timeout
pagent run scenarios/*.yaml --retries 2 --timeout 60000

# CI mode with JSON output
pagent run "scenarios/*.yaml" --json

pagent validate <pattern>

Validate YAML scenario files without running them.

pagent validate <pattern> [options]

Arguments:
  pattern    Glob pattern for scenario files

Options:
  --json     Output as JSON

pagent list [dir]

List YAML scenarios in a directory.

pagent list [dir] [options]

Arguments:
  dir        Directory to search (default: "scenarios")

Options:
  --json     Output as JSON

YAML DSL Specification

Schema (Version 1)

version: 1                          # Required: Schema version
name: my-test-scenario              # Required: Unique scenario name
baseURL: https://example.com        # Optional: Base URL for relative paths

use:                                # Optional: Browser configuration
  headless: true                    # Default: true
  viewport:                         # Default: { width: 1280, height: 720 }
    width: 1280
    height: 720
  timeoutMs: 30000                  # Default: 30000 (30 seconds)
  locale: en-US                     # Optional: Browser locale
  timezoneId: America/New_York      # Optional: Timezone
  userAgent: "..."                  # Optional: Custom user agent

artifacts:                          # Optional: Artifact collection settings
  html: true                        # Capture final page HTML
  screenshot: true                  # Capture final screenshot
  console: true                     # Capture console logs
  network: true                     # Capture network requests
  har: false                        # Save HAR file (default: false)
  trace: on-first-retry             # Trace recording: on|off|on-first-retry|retain-on-failure
  styles:                           # Compute styles for specific selectors
    - selector: "#myButton"
      computed: ["display", "color", "font-size"]
  networkBodyCapture:               # Capture response bodies (default: disabled)
    enabled: false
    maxSizeBytes: 1048576           # Max 1MB per response
    contentTypes: ["text/*", "application/json"]

steps:                              # Required: List of test steps
  - goto: "/login"
  - fill: { selector: "#email", text: "user@example.com" }
  - click: { selector: "#submit" }
  - expect: { urlContains: "/dashboard" }

Available Steps

Navigation

# Navigate to URL (absolute or relative to baseURL)
- goto: "/login"
- goto: "https://example.com/page"

Interactions

# Click element
- click: { selector: "#button" }
- click: { selector: "button", button: "right", clickCount: 2 }

# Fill input (clears existing content)
- fill: { selector: "#email", text: "user@example.com" }

# Type text (character by character with optional delay)
- type: { selector: "#search", text: "query", delayMs: 50 }

# Press keyboard key
- press: { key: "Enter" }
- press: { key: "Tab", selector: "#input" }

Waiting

# Wait for element
- waitFor: { selector: ".loaded" }
- waitFor: { selector: "#content", timeoutMs: 10000 }

# Wait for page state
- waitFor: { state: "load" }
- waitFor: { state: "domcontentloaded" }
- waitFor: { state: "networkidle", timeoutMs: 15000 }

Assertions

# Check element visibility
- expect: { visible: "#welcome-message" }
- expect: { hidden: ".loading-spinner" }

# Check page content
- expect: { textContains: "Welcome back" }

# Check URL
- expect: { urlContains: "/dashboard" }

Snapshots

# Take screenshot
- snapshot: { name: "after-login" }
- snapshot: { name: "full-page", fullPage: true }

Debug

# Execute JavaScript
- evaluate: { js: "console.log('Debug:', document.title)" }

Output Structure

artifacts/
└── 20251220-143052-abc123/          # Run ID
    ├── summary.json                  # Overall run summary
    └── my-scenario/                  # Scenario folder
        ├── summary.json              # Scenario summary
        ├── page.html                 # Final page HTML
        ├── screenshot.png            # Final screenshot
        ├── console.jsonl             # Console logs (one JSON per line)
        ├── network.jsonl             # Network requests (one JSON per line)
        ├── network.har               # HAR file (if enabled)
        ├── styles.json               # Computed styles
        ├── trace.zip                 # Playwright trace
        └── after-login.png           # Named snapshots

summary.json Format

{
  "ok": true,
  "runId": "20251220-143052-abc123",
  "scenario": "my-scenario",
  "filePath": "/path/to/scenario.yaml",
  "startedAt": "2025-12-20T14:30:52.000Z",
  "endedAt": "2025-12-20T14:31:04.000Z",
  "durationMs": 12345,
  "steps": [
    { "i": 1, "type": "goto", "ok": true, "durationMs": 1200 },
    { "i": 2, "type": "fill", "ok": true, "durationMs": 50 },
    { "i": 3, "type": "click", "ok": true, "durationMs": 100 },
    { "i": 4, "type": "expect", "ok": true, "durationMs": 30 }
  ],
  "artifacts": {
    "html": "page.html",
    "screenshot": "screenshot.png",
    "console": "console.jsonl",
    "network": "network.jsonl",
    "snapshots": ["after-login.png"]
  }
}

console.jsonl Format

{"timestamp":"2025-12-20T14:30:53.000Z","type":"log","text":"Page loaded"}
{"timestamp":"2025-12-20T14:30:54.000Z","type":"error","text":"API error","location":{"url":"app.js","lineNumber":42}}

network.jsonl Format

{"timestamp":"2025-12-20T14:30:52.500Z","method":"GET","url":"https://api.example.com/user","resourceType":"fetch","status":200,"statusText":"OK","requestHeaders":{"authorization":"[MASKED]"},"responseHeaders":{"content-type":"application/json"},"timing":{"startTime":1703082652500,"responseEnd":1703082652700,"durationMs":200}}

Exit Codes

  • 0: All scenarios passed
  • 1: One or more scenarios failed, or validation error

Security

Sensitive headers are automatically masked in network logs:

  • authorization
  • cookie
  • set-cookie
  • x-api-key
  • x-auth-token
  • x-access-token

Response body capture is disabled by default. When enabled, bodies are limited by size and content type.

Programmatic Usage

import { runScenario, loadScenario } from '@devload/pagent';

// Load and run a scenario
const { scenario } = await loadScenario('./scenarios/test.yaml');
const result = await runScenario(scenario, './scenarios/test.yaml', {
  headless: true,
  artifactDir: './artifacts',
});

console.log(result.summary.ok ? 'Passed' : 'Failed');

MCP Integration

The runScenario function can be wrapped as an MCP tool:

import { runScenario, loadScenario } from '@devload/pagent';

// In your MCP server tool handler
async function runUITest(scenarioPath: string) {
  const loadResult = await loadScenario(scenarioPath);
  if (!loadResult.success) {
    return { error: loadResult.errors };
  }

  const result = await runScenario(loadResult.scenario!, scenarioPath);
  return result.summary;
}

Assumptions & Defaults

  • Browser: Chromium only (for speed and consistency)
  • Timeout: 30 seconds default for all operations
  • Viewport: 1280x720 default
  • Artifacts: HTML, screenshot, console, network enabled by default
  • HAR: Disabled by default (large files)
  • Trace: Recorded on first retry by default
  • Network body: Not captured by default (security)

Troubleshooting

"Browser executable not found"

npx playwright install chromium

"Timeout waiting for selector"

Increase the timeout in your scenario:

use:
  timeoutMs: 60000

Or for specific steps:

- waitFor: { selector: ".slow-element", timeoutMs: 30000 }

Viewing Traces

npx playwright show-trace artifacts/*/trace.zip

Chrome Extension Bridge

PAGENT includes a Chrome extension bridge for controlling a real browser instance.

Chrome Extension Installation

Option 1: From GitHub (Recommended)

# Clone the repository
git clone https://github.com/devload/pagent.git

# The extension is in the chrome-extension/ folder

Option 2: From npm package

# Install the package
npm install @devload/pagent

# Extension is at: node_modules/@devload/pagent/chrome-extension/

Load the Extension in Chrome:

  1. Open Chrome and go to chrome://extensions/
  2. Enable "Developer mode" (top right toggle)
  3. Click "Load unpacked"
  4. Select the chrome-extension/ folder

Start Using

  1. Start the Bridge Server:

    pagent bridge start
  2. Connect the Extension:

    • Click the PAGENT extension icon in Chrome
    • Click "Connect"

Bridge Commands

# Start bridge server (default port 9222)
pagent bridge start
pagent bridge start --port 9000

# Get current page info
pagent bridge exec getPageInfo

# Capture screenshot
pagent bridge exec screenshot ./page.png

# Get page HTML
pagent bridge exec getDOM
pagent bridge exec getDOM "#main-content"

# Execute JavaScript
pagent bridge exec execute "document.title"
pagent bridge exec execute "document.querySelectorAll('a').length"

# Interact with elements
pagent bridge exec click "#submit-button"
pagent bridge exec fill "#email" "test@example.com"
pagent bridge exec navigate "https://example.com"

# Tab management
pagent bridge exec newTab "https://google.com"
pagent bridge exec listTabs
pagent bridge exec switchTab 123456789
pagent bridge exec closeTab 123456789

# Execute on specific tab (without switching)
pagent bridge exec getPageInfo --tab 123456789
pagent bridge exec screenshot --tab 123456789

# Get captured logs
pagent bridge exec consoleLogs
pagent bridge exec networkLogs

Bridge Architecture

┌──────────────────────────────────────┐
│          Chrome Browser              │
│  ┌────────────────────────────────┐  │
│  │     PAGENT Extension           │  │
│  │    (WebSocket Client)          │  │
│  └─────────────┬──────────────────┘  │
└────────────────┼─────────────────────┘
                 │ ws://localhost:9222
                 ▼
┌──────────────────────────────────────┐
│   pagent bridge start                │
│   (WebSocket Server)                 │
└──────────────────────────────────────┘

Use Cases

  • Real Browser Testing: Test in actual Chrome with real extensions
  • DevTools Integration: Access console logs, network requests in real-time
  • Manual + Automated: Combine manual browsing with CLI automation
  • AI Integration: Ask AI assistants to control your browser via MCP

MCP Integration (Model Context Protocol)

PAGENT can be used as an MCP server, allowing AI assistants like Claude to control your browser.

Setup for Claude Code (CLI) - Recommended

Option 1: Using claude mcp add command

# Add PAGENT as MCP server (one command!)
claude mcp add pagent -s user -- npx -y @devload/pagent

# Or with environment variable
claude mcp add pagent -s user -e BROWSER_BRIDGE_URL=ws://localhost:9222 -- npx -y @devload/pagent

# Verify installation
claude mcp list

Option 2: Manual configuration

Add to your project's .mcp.json:

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Or add to global config (~/.claude/settings.json):

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Setup for Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "pagent": {
      "command": "npx",
      "args": ["-y", "@devload/pagent"],
      "env": {
        "BROWSER_BRIDGE_URL": "ws://localhost:9222"
      }
    }
  }
}

Prerequisites

Before using MCP, you need:

  1. Install the Chrome Extension:

    # Load unpacked extension from chrome-extension/ folder
    # in chrome://extensions/
  2. Start the Bridge Server:

    pagent bridge start
  3. Connect the Extension:

    • Click the PAGENT extension icon
    • Click "Connect"

Available MCP Tools

Tool Description
browser_list_tabs List all open browser tabs
browser_get_page_info Get URL, title, and state of a tab
browser_navigate Navigate to a URL
browser_new_tab Open a new tab
browser_close_tab Close a tab
browser_click Click an element by CSS selector
browser_fill Fill an input field
browser_screenshot Capture a screenshot
browser_get_dom Get page HTML content
browser_console_logs Get browser console logs
browser_network_logs Get network request logs

Example Usage with Claude

Once connected, you can ask Claude to:

  • "Open google.com and search for 'Anthropic Claude'"
  • "Take a screenshot of the current page"
  • "Fill in the login form with my email"
  • "Click the submit button"
  • "Get all the links on this page"

License

MIT

About

Page Agent - Control your browser from CLI and AI assistants. YAML-based UI testing with Playwright and MCP support.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published