Changelog.
All notable changes to @isoldex/sentinel. Latest version: v4.1.0 · npm · Github
Three killer features: fillForm(json), intercept(), TOTP/MFA. Plus widget detection, click-target verification, and form intelligence.
- addedsentinel.fillForm(json) — declarative form filling with a single JSON object. Sentinel maps keys to form fields via LLM and fills them automatically.
- addedsentinel.intercept(urlPattern, trigger) — network interception: capture raw API responses during browser actions instead of scraping DOM.
- addedTOTP/MFA automation — mfa: { type: 'totp', secret: '...' } auto-generates 2FA codes during login flows. generateTOTP() also exported standalone.
- addedplannerModel / plannerProvider — use a stronger model for planning (e.g. Gemini 3.1 Pro) while a cheap model (Flash) handles execution.
- addedmode: 'aom' | 'hybrid' | 'vision' — configurable element detection strategy with vision fallback on coordinate mismatch.
- addedClick-target verification — verifies the element at click coordinates matches the intended target, with Playwright locator fallback on mismatch.
- addedWidget pattern detection — 9 patterns for custom dropdowns, datepickers, sliders, and CSS-library components (React Select, Ant Design, MUI, etc.).
- addedUniversal slider-fill — 3-strategy cascade: native range input, sibling text-input (Amazon-style price filters), or keyboard simulation via aria-valuemin/max.
- addedValidation error detection — reads form error messages via aria-invalid, role='alert', class*='error' and passes them to the planner.
- addedForm field/button separation — planner prompt structurally separates form fields from buttons with filled/unfilled status indicators.
- addedProactive blocker dismissal — cookie banners and modals are dismissed at the start of each step, not just on failure.
- changedState verification uses compact fingerprint (role+name+region+value+error+state). Catches dropdown openings, focus shifts, and value changes.
- changedUnicode regex (\p{L}\p{N}) for text normalization — supports all Latin-script languages.
- changedAction-level retry with exponential backoff for transient failures (timeout, detached, disposed).
Top-3 candidate ranking, pre-action validation, cookie auto-recovery, spatial region tags, contenteditable support.
- breakingDefault viewport changed from 1280×720 to 1920×1080.
- breakingLLM action schema uses candidates[] array instead of single elementId.
- breakingPrompt format changed from JSON to pipe-delimited (id | role | name | region).
- addedTop-3 candidate ranking — LLM returns up to 3 element candidates per action; next candidate is tried instantly without a new LLM call on failure.
- addedPre-action validation — validateTarget() checks disabled, hidden, or overlay-blocked elements before clicking.
- addedCookie/overlay auto-recovery — automatically dismisses cookie banners and closes modals before retrying.
- addedSpatial region tags — every element gets a region field (header/nav/sidebar/main/footer/modal/popup).
- addedcontenteditable support — rich-text editors are detected and handled correctly (fill uses Ctrl+A).
- addedScroll discovery — batch-scrolls to find elements in virtual-scrolling containers.
- addedVision-augmented planning — planner receives a visual page description on complex pages (>100 elements).
- addedDOM fallback for off-screen AOM — triggers full DOM parse when all AOM elements are outside the viewport.
OpenTelemetry traces and metrics — every LLM call, act(), extract(), and agent step is instrumented.
- addedOpenTelemetry support — 6 span types (sentinel.agent → sentinel.agent.step → sentinel.act → sentinel.llm, plus sentinel.extract, sentinel.observe) and 6 metrics (act.requests, act.duration_ms, llm.requests, llm.tokens, llm.duration_ms, agent.steps). Zero overhead when no OTel SDK is configured.
- fixedActionResult.selector was discarded by the outer Sentinel.act() wrapper — now correctly forwarded from ActionEngine.
Stable CSS selector export after every run() — paste directly into Playwright tests.
- addedAgentResult.selectors — camelCase slug of each instruction maps to the most stable CSS selector found. Priority: data-testid → #id → [name] → [placeholder] → [aria-label] → role:has-text.
- addedActionResult.selector — single act() calls now also expose the selector for the acted-on element.
- addedslugifyInstruction() exported from @isoldex/sentinel — converts a natural-language instruction to a camelCase key.
Prompt caching — identical (prompt, schema) pairs return instantly at zero token cost.
- addedpromptCache option (false | true | string) — in-memory LRU (200 entries) or file-persisted cache keyed by djb2 hash of prompt + schema. Covers act(), extract(), observe(), and the agent loop.
- addedsentinel.clearPromptCache() — flush the prompt cache programmatically.
- addedIPromptCache interface exported for custom backends (Redis, SQLite, etc.).
- fixedSentinel.parallel() factory errors are now isolated per task — a browser launch failure no longer aborts remaining tasks.
- fixedextend() CDP session leak — calling extend() on the same page multiple times now detaches the previous session first.
Sentinel.parallel() — concurrent browser sessions with a worker pool, error isolation, and progress callbacks.
- addedSentinel.parallel(tasks, options) — runs N independent agent tasks in parallel, each in its own browser session. concurrency option limits simultaneous sessions (default: 3).
- addedonProgress callback — fires after each task with (completed, total, result).
- addedParallelTask, ParallelResult, ParallelOptions types exported.
sentinel.extend(page) — add AI capabilities to any existing Playwright Page object.
- addedsentinel.extend(page) — attaches act(), extract(), and observe() directly to any Playwright Page. Drop-in for existing Playwright projects.
- addedverbose: 3 — new debug level exposing chunk-processing stats and full LLM decision JSON per act() call.
- changedverbose: 1 now logs action summaries only. Reasoning moved to verbose: 2. Minor breaking change for consumers relying on reasoning at level 1.
Chunk-processing, Shadow DOM, and iframe support.
- addedfilterRelevantElements() — keyword-overlap scoring reduces elements sent to LLM on pages with 200+ interactive elements. maxElements option (default: 50).
- addedFull Shadow DOM support — parseDOMSnapshot() and parseFormElements() recursively pierce all shadow roots via queryShadowAll(). Covers Salesforce, ServiceNow, Lit, Polymer, Stencil.
- addediframe support — parseFrameElements() collects interactive elements from same-origin frames with coordinate offsets.
Intelligent error messages with structured diagnostic output and actionable tips.
- addedActionResult.attempts — structured array of every tried path (coordinate-click, vision-grounding, locator-fallback) with specific errors.
- addedContextual tip in result.message — outside viewport → scroll suggestion, timeout → overlay hint, all paths exhausted → rephrase or enable visionFallback.
Self-healing locators — cache successful element lookups, skip the LLM on repeated calls.
- addedlocatorCache option (false | true | string) — in-memory or file-persisted cache. On repeated act() calls with the same URL + instruction, Playwright uses the cached selector directly.
- addedAutomatic cache invalidation — if the cached element is gone, the entry is removed and the LLM path takes over.
- addedILocatorCache interface exported for custom backends (Redis, etc.).
Six bug fixes: stale DOM on retries, tab index corruption, elementCounter race, wrong Gemini model, token callback leak, MCP crash on browser failure.
- fixedstateParser.invalidateCache() now called at the start of every retry attempt — not just once before the loop.
- fixedcloseTab() now correctly decrements activePageIndex when a lower-index tab is closed.
- fixedelementCounter race condition in StateParser — now a local variable threaded through parallel parse() calls.
- fixedGeminiProvider.generateText() with systemInstruction now uses the constructor model, not process.env.GEMINI_VERSION.
- fixedonTokenUsage callback nulled out on close() to prevent TokenTracker from being held in memory.
- fixedAll 7 MCP tool handlers now wrapped in try-catch — browser failures return isError: true instead of crashing the server.
CLI tool, MCP server, and Playwright Test integration — all in one release.
- addedCLI — sentinel binary with 4 subcommands: run, act, extract, screenshot. Accepts --api-key, --headless, --model, --output, --url.
- addedMCP server — 8 tools exposed via stdio transport: sentinel_goto, _act, _extract, _observe, _run, _screenshot, _close, _token_usage.
- addedPlaywright Test fixture — @isoldex/sentinel/test exports test with ai fixture. sentinelOptions configurable globally in playwright.config.ts.
extract step type in AgentLoop, append action, token tracking, withRetry utility.
- breakingAgentLoop constructor now requires extractionEngine as second parameter. Only affects consumers constructing AgentLoop directly — sentinel.run() is unaffected.
- addedAgentResult.data — the planner can now issue extract steps mid-run. Structured data is returned in the final AgentResult.
- addedappend action type — appends text to an input without clearing existing content.
- addedToken usage tracking via onTokenUsage callback — all four providers fire it after every LLM call.
- addedwithRetry() utility — unified exponential backoff extracted from all four providers.
- fixeddomSettleTimeoutMs not forwarded to ActionEngine — now passed to all three waitForPageSettle call sites.
userDataDir — persistent browser profiles including IndexedDB (WhatsApp Web, PWAs).
- addeduserDataDir option — persists the full Chromium profile including IndexedDB and ServiceWorkers. Required for apps that use IndexedDB for auth (WhatsApp Web, etc.).
Native vision support for all LLM providers via analyzeImage().
- addedanalyzeImage() method added to all four providers (Gemini, OpenAI, Claude, Ollama). visionFallback: true now works with any vision-capable provider.
- changedDefault Gemini model updated to gemini-3-flash-preview.
Contextual button naming, off-screen enrichment, withTimeout on all actions, 4-strategy locator chain.
- addedContextual button naming — StateParser walks AOM ancestors to enrich generic labels like 'Select plan' with card context ('Kelag | Fixtarif | 17,40 cent/kWh: Select plan').
- addedwithTimeout wrapper on all actions — 10-second timeout prevents indefinite hangs.
- addedViewport bounds check before click + scrollIntoViewIfNeeded fallback.
- addedRadio/checkbox JS click fallback — handles inputs hidden via CSS by traversing to the closest label.
- added4-strategy locator chain: exact role+name → inexact role+name → CSS :has-text → plain text.
- changedMutationObserver DOM settle replaces networkidle — resolves after 300ms of DOM silence (cap: 3s).
Major release — Sentinel becomes a full AI agent framework.
- addedAutonomous Agent Loop — sentinel.run(goal) with Plan → Execute → Verify → Reflect cycle.
- addedVision Grounding — Gemini Vision fallback in act() via visionFallback option.
- addedMulti-LLM Provider System — GeminiProvider, OpenAIProvider, ClaudeProvider, OllamaProvider.
- addedMulti-Tab and Multi-Browser support (Chromium, Firefox, WebKit).
- addedSession Persistence — saveSession(), sessionPath option.
- addedRecord and Replay — startRecording(), stopRecording(), exportWorkflowAsCode(), replay().
- addedProxy and Stealth Mode — proxy option, humanLike delays, User-Agent rotation.
- addedEvent System — Sentinel extends EventEmitter, emits action, navigate, close.
- addedToken Tracking — getTokenUsage(), exportLogs().
- addedStructured Error Classes — SentinelError, ActionError, ExtractionError, NavigationError, AgentError, NotInitializedError.
Initial release — AOM-based browser automation with natural language actions.
- addedPlaywright-based browser automation (Chromium). AOM via CDP.
- addedsentinel.act(instruction) — click, fill, hover.
- addedsentinel.extract(instruction, schema) — Zod-typed structured extraction.
- addedsentinel.observe() — page observation via AOM.
- addedSemantic verification loop with automatic retry.
- addedGemini Flash / Pro integration, verbose logging levels.