Skip to main content
sentinel
Manualv4.1.0

Field manual.

Complete reference for @isoldex/sentinel — up and running in 5 minutes.

Installation

bashterminal
npm install @isoldex/sentinel playwright
npx playwright install chromium

Create a .env file with your API key:

bash.env
GEMINI_API_KEY=your_key_here
GEMINI_VERSION=gemini-3-flash-preview   # optional

Get a free key at aistudio.google.com. The free tier covers thousands of runs. For OpenAI, Claude, or Ollama see Providers.

Quickstart

act() performs natural-language actions. extract() returns typed structured data.

typescriptindex.ts
import { Sentinel, z } from '@isoldex/sentinel';

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
await sentinel.init();
await sentinel.goto('https://news.ycombinator.com');

// Extract structured data
const data = await sentinel.extract('Get the top 3 stories', z.object({
  stories: z.array(z.object({
    title: z.string(),
    points: z.number(),
  }))
}));

// Natural language actions
await sentinel.act('Click on the "new" link in the header');
await sentinel.act('Fill "hello@example.com" into the email field');

await sentinel.close();

act(instruction, options?)

Performs a natural language action on the current page. Sentinel automatically verifies the action and retries on weak confidence.

typescriptactions.ts
// Basic click / fill / hover
await sentinel.act('Click the login button');
await sentinel.act('Fill "user@example.com" into the email field');
await sentinel.act('Hover over the profile menu');

// All supported action types
await sentinel.act('Select "Germany" from the country dropdown');
await sentinel.act('Press Enter');
await sentinel.act('Double-click the product image');
await sentinel.act('Right-click the file');
await sentinel.act('Scroll down');
await sentinel.act('Scroll up');
await sentinel.act('Scroll to the footer');
await sentinel.act('Append " (urgent)" to the subject line');

// Variable interpolation
await sentinel.act('Fill %email% into the email field', {
  variables: { email: 'user@example.com' },
});

// Custom retry count
await sentinel.act('Click the submit button', { retries: 5 });

ActOptions

  • variables — Record<string, string>
  • retries — number (default: 2)

Supported actions

click · fill · append · hover · press · select · double-click · right-click · scroll-down · scroll-up · scroll-to

ActionResult — returned by every act() call, never throws:

typescriptact-result.ts
const result = await sentinel.act('Click the checkout button');

console.log(result.success);   // boolean
console.log(result.message);   // "Clicked Checkout button"
console.log(result.action);    // "click"
console.log(result.selector);  // '[data-testid="checkout-btn"]'

// On failure — full diagnostic
if (!result.success) {
  console.log(result.message);
  // Action failed: "Click checkout button" on "Checkout"
  // 3 paths tried:
  //   • coordinate-click: Element outside viewport at (640, 950)
  //   • vision-grounding: Element not found in screenshot
  //   • locator-fallback: strict mode violation: 3 elements matched
  // Tip: element may be off-screen. Try: sentinel.act('scroll to "Checkout"')

  console.log(result.attempts);
  // [{ path: 'coordinate-click', error: '...' }, ...]
}

extract<T>(instruction, schema)

Extracts structured data from the current page. Accepts a Zod schema or raw JSON Schema. TypeScript generics are inferred automatically.

typescriptextract.ts
import { Sentinel, z } from '@isoldex/sentinel';

// Zod schema — TypeScript type is inferred automatically
const result = await sentinel.extract(
  'Get all product names and prices',
  z.object({
    products: z.array(z.object({
      name:  z.string(),
      price: z.number(),
    }))
  })
);
// result.products is typed as { name: string; price: number }[]

// Raw JSON Schema also works
const result2 = await sentinel.extract('Get the page title', {
  type: 'object',
  properties: { title: { type: 'string' } },
});

observe(instruction?)

Returns interactive elements visible on the page, optionally filtered by a natural language hint. Useful for debugging or building dynamic workflows.

typescriptobserve.ts
// All interactive elements on the page
const elements = await sentinel.observe();

// Filtered by natural language hint
const loginElements = await sentinel.observe('Find login-related elements');

// Returns ObserveResult[]
// [{ description: 'Login button', role: 'button', ... }, ...]

run(goal, options?) — Agent Loop

Runs a fully autonomous multi-step agent in a Plan → Execute → Verify → Reflect cycle until the goal is met, the step limit is reached, or an abort condition triggers.

typescriptagent.ts
const result = await sentinel.run(
  'Go to amazon.de, search for "mechanical keyboard under 100 euros", extract top 5',
  {
    maxSteps: 20,
    onStep: (event) => {
      console.log(`Step ${event.stepNumber} [${event.type}]: ${event.instruction}`);
      console.log(`  Reasoning: ${event.reasoning}`);
    },
  }
);

console.log(result.success);       // boolean
console.log(result.goalAchieved);  // boolean — final LLM reflection check
console.log(result.totalSteps);    // number of steps executed
console.log(result.message);       // human-readable summary
console.log(result.data);          // structured data extracted during the run
console.log(result.selectors);     // { searchField: '#twotabsearchtextbox', ... }
console.log(result.history);       // AgentStepEvent[] — full step-by-step log

AgentRunOptions

  • maxSteps — number (default: 15)
  • onStep — (event: AgentStepEvent) => void

Abort conditions

  • 3 consecutive step failures
  • Same instruction repeated 3× without progress
  • maxSteps reached

sentinel.fillForm(data, options?)

Fill a form declaratively with a JSON object. Sentinel maps keys to form fields via LLM — no step-by-step instructions needed. Works across languages (e.g. brand maps to Marke on German sites).

typescriptfill-form.ts
await sentinel.goto('https://insurance-site.com');

// One JSON — all fields filled automatically
await sentinel.fillForm({
  brand: 'BMW',
  model: '4er',
  year: 2020,
  fuel: 'Benzin',
  postalCode: '1010',
  name: 'Max Mustermann',
});
// Sentinel maps: brand → Marke, model → Modell, year → Baujahr
// Fills top-to-bottom, clicks submit when done.

sentinel.intercept(urlPattern, trigger)

Capture raw API responses during a browser action. Instead of scraping the rendered DOM, read the structured JSON that the website receives from its own backend — more reliable, complete, and precise.

typescriptintercept.ts
// Capture raw API data instead of scraping the DOM
const hotels = await sentinel.intercept('graphql', async () => {
  await sentinel.act('Click the search button');
});

console.log(hotels);
// [{ data: { searchResults: [{ name: "Hotel A", price: 89 }, ...] } }]
// Structured JSON — no CSS parsing, no "price: null" issues.

TOTP / MFA Automation

Automatically generate 2FA codes during login flows. Pass the TOTP secret (the same base32 key you scan with Google Authenticator) and Sentinel fills verification code fields automatically.

typescriptmfa.ts
import { Sentinel, generateTOTP } from '@isoldex/sentinel';

// Auto-generate 2FA codes during login flows
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  mfa: { type: 'totp', secret: 'JBSWY3DPEHPK3PXP' },
});

await sentinel.run('Login to my banking portal');
// Agent sees TOTP field → generates code automatically → continues

// Or generate a code manually
const code = generateTOTP('JBSWY3DPEHPK3PXP');
console.log(code); // "492039"

Planner Model Split

Use a stronger model for planning decisions and a cheaper model for action execution. The plannerModel option creates a separate LLM instance for the planner while keeping Flash for act/extract/observe. Choose a detection mode with mode:aom (fast),hybrid (reliable), or vision (CUA-style).

typescriptplanner-model.ts
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  // Smart model for planning, cheap model for execution
  plannerModel: 'gemini-3.1-pro-preview',
  // Detection mode: aom (fast), hybrid (reliable), vision (CUA-style)
  mode: 'hybrid',
});

Sentinel.parallel(tasks, options)

Runs multiple independent tasks in parallel. Each task gets its own browser session. A worker pool limits simultaneous sessions to concurrency. Errors in one task never affect others.

typescriptparallel.ts
const results = await Sentinel.parallel(
  [
    { url: 'https://amazon.de', goal: 'Find cheapest laptop' },
    { url: 'https://ebay.de',   goal: 'Find cheapest laptop' },
    { url: 'https://otto.de',   goal: 'Find cheapest laptop' },
  ],
  {
    apiKey: process.env.GEMINI_API_KEY,
    concurrency: 3,
    onProgress: (done, total, result) => {
      console.log(`${done}/${total}: ${result.url} — ${result.message}`);
    },
  }
);

// Results in input order regardless of completion order
// Error in one task never affects the others

Tab management

Open, switch, and close browser tabs programmatically. AOM-based state parsing requires Chromium (CDP). Firefox and WebKit fall back to DOM parsing.

typescripttabs.ts
// Open a new tab
const tabIndex = await sentinel.newTab('https://google.com');

// Switch the active tab
await sentinel.switchTab(0);
await sentinel.switchTab(tabIndex);

// Close a tab
await sentinel.closeTab(tabIndex);

// Number of open tabs
console.log(sentinel.tabCount);

Session persistence

Save and restore authenticated sessions across runs — cookies and localStorage included. For apps that use IndexedDB (WhatsApp Web, PWAs), use userDataDir instead.

typescriptsession.ts
// First run: log in, then save the session
await sentinel.goto('https://github.com/login');
await sentinel.act('Fill "myuser" into the username field');
await sentinel.act('Fill "mypassword" into the password field');
await sentinel.act('Click the sign in button');
await sentinel.saveSession('./sessions/github.json');

// Subsequent runs: session is restored automatically
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  sessionPath: './sessions/github.json', // loaded on init()
});
await sentinel.init();
await sentinel.goto('https://github.com'); // already authenticated

userDataDir persists the full browser profile including IndexedDB and ServiceWorkers:

typescriptpersistent-profile.ts
// Persists the full browser profile — including IndexedDB.
// Required for services like WhatsApp Web, PWAs, and SPA-based apps.
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  userDataDir: './profiles/whatsapp',  // created automatically if missing
});
await sentinel.init();
// First run: complete login (scan QR code).
// All subsequent runs: session restored automatically — no re-auth needed.

Record & Replay

Capture any automation session as a replayable workflow. Export as TypeScript source or JSON for storage and version control.

typescriptrecord-replay.ts
// Start recording
sentinel.startRecording('checkout-flow');

await sentinel.goto('https://shop.example.com');
await sentinel.act('Click the first product');
await sentinel.act('Click Add to Cart');
await sentinel.act('Proceed to checkout');

// Stop and get the workflow
const workflow = sentinel.stopRecording();

// Export as TypeScript source code
const code = sentinel.exportWorkflowAsCode(workflow);
console.log(code); // ready-to-run TypeScript

// Export as JSON
const json = sentinel.exportWorkflowAsJSON(workflow);

// Replay the recorded workflow
await sentinel.replay(workflow);

Vision grounding

Vision-model fallback for canvas elements, shadow DOMs, and custom components that aren't exposed through the accessibility tree. Supported by all four built-in providers (Gemini, OpenAI, Claude, Ollama vision models).

typescriptvision.ts
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,
  visionFallback: true, // activates vision grounding
});

// Takes a PNG screenshot → Buffer
const png = await sentinel.screenshot();

// Natural language description of the current page
const description = await sentinel.describeScreen();
console.log(description);
// "The page shows an Amazon product listing with a laptop card..."

// Vision grounding also activates automatically inside act()
// when AOM cannot locate the target element — no extra code needed.

Self-healing & caching

Three independent caching layers dramatically reduce LLM usage on repeated runs. They stack on top of Gemini's already 40× cheaper baseline — enable all three in production, especially when the same widget shapes recur across pages or sites.

typescriptcaching.ts
const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,

  // Self-healing locators — cache successful element → selector mappings (URL-scoped)
  locatorCache: './sentinel-locators.json', // file-persisted (or: true for in-memory)

  // Prompt cache — cache LLM responses by prompt hash
  promptCache: './sentinel-prompts.json',   // file-persisted (or: true for in-memory)

  // Pattern cache — fingerprint widgets by ARIA / library shape, reuse cross-site
  patternCache: './sentinel-patterns.json', // file-persisted; default 'true' (in-memory)
});

// Flush the prompt cache programmatically (e.g. between test runs)
sentinel.clearPromptCache();

// Custom cache backends (e.g. Redis for distributed test runs)
import type { ILocatorCache, CachedLocator } from '@isoldex/sentinel';

class RedisLocatorCache implements ILocatorCache {
  get(url: string, instruction: string): CachedLocator | undefined { /* ... */ }
  set(url: string, instruction: string, entry: CachedLocator): void { /* ... */ }
  invalidate(url: string, instruction: string): void { /* ... */ }
}
locatorCache

Caches successful element → selector mappings, scoped by URL + instruction. On repeat calls, the Playwright locator is tried first — LLM only called if it breaks. Supports custom backends via ILocatorCache.

promptCache

Caches LLM responses by a hash of prompt + schema. Identical (prompt, schema) pairs return instantly at zero token cost. URL and page title are part of the hash — cache misses automatically on DOM changes.

patternCache

Fingerprints interactive widgets by ARIA / library-class / DOM topology and reuses successful interaction sequences across any site that renders the same widget shape. A date-picker learned on site A works on site B. Sensitive-value roles (password, tel) are redacted before persist. Enabled by default (in-memory); pass a file path to persist across runs.

Stealth & proxy

Human-like delays, User-Agent rotation (automatic), and proxy support for bot-detection evasion and geo-restricted content.

typescriptstealth.ts
import { Sentinel, RoundRobinProxyProvider, WebshareProxyProvider } from '@isoldex/sentinel';

const sentinel = new Sentinel({
  apiKey: process.env.GEMINI_API_KEY,

  // Bézier mouse curves + per-action delays (80–200 ms) + human keystroke timing
  humanLike: true,

  // Static proxy
  proxy: { server: 'http://proxy.example.com:8080', username: 'u', password: 'p' },

  // — OR — round-robin through a list
  proxy: new RoundRobinProxyProvider([
    { server: 'http://p1:8080' },
    { server: 'http://p2:8080' },
  ]),

  // — OR — Webshare API with automatic rotation
  proxy: new WebshareProxyProvider({ apiKey: process.env.WEBSHARE_KEY! }),
});
// User-Agent rotation is automatic — no config needed.

Page extension — sentinel.extend(page)

Attaches act(), extract(), and observe() directly to any existing Playwright Page object. Drop-in for existing Playwright projects — no test restructuring needed.

typescriptextend.ts
import { chromium } from 'playwright';
import { Sentinel } from '@isoldex/sentinel';

const browser = await chromium.launch();
const page    = await browser.newPage();

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });

// Attach Sentinel capabilities to any existing Playwright Page
await sentinel.extend(page);

// Now use act/extract/observe directly on the page object
await page.goto('https://example.com');
await page.act('Click the login button');

const data = await page.extract('Get the page title', z.object({
  title: z.string(),
}));

Selector export

After every run() call, result.selectors contains stable CSS selectors for every element that was acted on — ready to paste into Playwright tests. Priority: data-testid → #id → [name] → [placeholder] → [aria-label] → role:has-text.

typescriptselectors.ts
// After sentinel.run() — selectors for all act() steps
const result = await sentinel.run('Login with test@example.com');

console.log(result.selectors);
// {
//   clickLoginButton:  '[data-testid="login-btn"]',
//   fillEmailField:    '#email',
//   fillPasswordField: '[name="password"]',
// }

// Copy directly into Playwright tests — no DevTools digging
import { test, expect } from '@playwright/test';
test('login', async ({ page }) => {
  await page.click('[data-testid="login-btn"]');
});

// Single act() also exposes the selector
const r = await sentinel.act('Click the search field');
console.log(r.selector); // 'input[aria-label="Search"]'

Events & token tracking

Sentinel extends Node.js EventEmitter. Use events for logging, dashboards, or integration with external monitoring tools.

typescriptevents.ts
// Event system (Sentinel extends EventEmitter)
sentinel.on('action', (event) => {
  console.log('Action:', event.instruction, event.result);
});

sentinel.on('navigate', (event) => {
  console.log('Navigated to:', event.url);
});

sentinel.on('close', () => {
  console.log('Browser closed');
});

// Direct Playwright access
const page    = sentinel.page;
const context = sentinel.context;

// Token tracking
const usage = sentinel.getTokenUsage();
console.log(usage);
// {
//   totalInputTokens: 9800,
//   totalOutputTokens: 2600,
//   totalTokens: 12400,
//   estimatedCostUsd: 0.00093,
//   entries: [...],
// }

// Export full log as JSON
sentinel.exportLogs('./logs/session.json');

OpenTelemetry

Every call emits traces and metrics automatically. Zero overhead when no OTel SDK is configured (no-op API). Drop into Datadog, Grafana, Jaeger, or any OTLP backend.

typescriptinstrumentation.ts
import { NodeSDK } from '@opentelemetry/sdk-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({ url: 'http://localhost:4318/v1/traces' }),
  metricReader: new PrometheusExporter({ port: 9464 }),
});
sdk.start(); // must be called BEFORE new Sentinel(...)

const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
// All act() / extract() / run() calls now emit spans and metrics automatically

Emitted spans

sentinel.agent

└─ sentinel.agent.step

└─ sentinel.act / sentinel.extract / sentinel.observe

└─ sentinel.llm

Emitted metrics

  • sentinel.act.requests · sentinel.act.duration_ms
  • sentinel.llm.requests · sentinel.llm.tokens · sentinel.llm.duration_ms
  • sentinel.agent.steps

Playwright Test integration

Drop-in integration for existing Playwright Test suites. The ai fixture auto-initializes before each test and auto-closes after, regardless of outcome.

typescriptcheckout.spec.ts
import { test, expect } from '@isoldex/sentinel/test';
import { z } from 'zod';

test('completes checkout flow', async ({ ai, page }) => {
  await ai.goto('https://shop.example.com');
  await ai.act('Click the first product');
  await ai.act('Click Add to Cart');
  await ai.act('Proceed to checkout');

  const order = await ai.extract<{ total: string; items: number }>(
    'Get the order total and item count',
    z.object({ total: z.string(), items: z.number() })
  );

  expect(order.items).toBeGreaterThan(0);
  console.log('Token cost:', ai.getTokenUsage().estimatedCostUsd);
});

Configure Sentinel options globally in playwright.config.ts:

typescriptplaywright.config.ts
// playwright.config.ts
import { defineConfig } from '@playwright/test';

export default defineConfig({
  use: {
    sentinelOptions: {
      headless: false,
      verbose: 1,
      locatorCache: '.sentinel-cache.json',
    },
  },
});

CLI

Run browser automation without writing any code — paste a URL and a goal, get results. The API key is read from GEMINI_API_KEY in the environment or passed via --api-key.

bashterminal
# Run an autonomous agent
npx @isoldex/sentinel run "Search for mechanical keyboards" \
  --url https://amazon.de \
  --output result.json

# Perform a single action
npx @isoldex/sentinel act "Click the login button" \
  --url https://example.com

# Extract structured data
npx @isoldex/sentinel extract "Get all product names and prices" \
  --url https://shop.example.com \
  --schema '{"type":"object","properties":{"products":{"type":"array"}}}'

# Take a screenshot
npx @isoldex/sentinel screenshot \
  --url https://example.com \
  --output page.png
CommandDescriptionKey flags
runAutonomous agent — achieves a natural language goal--url, --output, --max-steps
actSingle natural language action on the page--url, --headless
extractExtract structured data from the page as JSON--url, --schema, --output
observeList interactive elements on the page--url, --output
screenshotTake a PNG screenshot of the page--url, --output
FlagDefaultDescription
--urlrequiredURL to navigate to before running the command
--api-keyGEMINI_API_KEY envGemini API key
--modelgemini-3-flash-previewGemini model (GEMINI_VERSION env also works)
--headlessfalseRun browser headlessly (no visible window)
--outputstdoutWrite JSON / PNG result to a file path
--max-steps15Maximum agent steps (run command only)
--schemaJSON Schema string for extract command
--verbose1Log verbosity 0–3

Error handling

All Sentinel errors extend SentinelError, which carries a code string and optional context. Most workflows prefer the non-throwing pattern via result.success.

typescripterrors.ts
import {
  SentinelError,
  ActionError,
  ExtractionError,
  NavigationError,
  AgentError,
  NotInitializedError,
} from '@isoldex/sentinel';

try {
  await sentinel.act('Click the submit button');
} catch (err) {
  if (err instanceof ActionError) {
    console.error(err.message, err.code, err.context);
    // code: "ACTION_FAILED"
  }
}

// Non-throwing alternative — check result.success
const result = await sentinel.act('Click checkout');
if (!result.success) {
  // result.message has the full diagnostic
  // result.attempts has each tried path
}
ClassCodeWhen thrown
ActionErrorACTION_FAILEDact() fails after all retries
ExtractionErrorEXTRACTION_FAILEDextract() fails
NavigationErrorNAVIGATION_FAILEDgoto() fails
AgentErrorAGENT_ERRORrun() exceeds maxSteps or gets stuck
NotInitializedErrorNOT_INITIALIZEDany method called before init()

SentinelOptions

All options passed to new Sentinel(options).

OptionTypeDefaultDescription
apiKeystringGemini API key. Pass '' when using a custom provider.
headlessbooleanfalseRun browser in headless mode.
browser'chromium'|'firefox'|'webkit''chromium'Browser engine. CDP/AOM requires Chromium.
viewport{ width, height }1920×1080Viewport dimensions.
verbose0|1|2|31Log verbosity. 0=silent, 3=full debug with LLM JSON.
enableCachingbooleantrueCache AOM state between calls (2000ms TTL).
mode'aom'|'hybrid'|'vision''aom'Element detection: aom (fast), hybrid (+ vision fallback), vision (CUA).
plannerModelstringGemini model for planner (e.g. 'gemini-3.1-pro-preview').
plannerProviderLLMProviderCustom LLM provider for the planner.
mfa{ type, secret }TOTP/MFA config. Auto-generates 2FA codes during login.
visionFallbackbooleanfalseEnable vision-model fallback in act(). Deprecated: use mode.
providerLLMProviderGeminiCustom LLM provider (OpenAI, Claude, Ollama…).
sessionPathstringPath to session file. Loaded on init() if it exists.
userDataDirstringPersistent browser profile (IndexedDB, ServiceWorkers).
proxyProxyOptionsProxy server config ({ server, username, password }).
humanLikebooleanfalseAdd random human-like delays between actions.
domSettleTimeoutMsnumber3000Max ms to wait for DOM to settle after an action.
locatorCacheboolean|stringfalseCache successful selectors (URL-scoped). String = file path.
promptCacheboolean|stringfalseCache LLM responses by prompt hash. String = file path.
patternCacheboolean|stringtrueFingerprint widgets, reuse interactions cross-site. String = file path.
maxElementsnumber50Max elements sent to LLM per act() call (chunk-processing).