Skip to main content
sentinel
Field Test04.2026 · 5 sites

Sentinel vs Stagehand.

Head-to-head comparison on 5 real websites. Same model (Gemini Flash), same instructions, same machine. Default settings for both frameworks. April 2026.

5/5
Sites passed
vs. Stagehand 2/5
6–10×
Token efficiency
2–5k vs 29–51k tokens/action
$0.003
Cost per run
vs. $0.03+ (Stagehand)

Real-world E2E results

Both frameworks ran the same 5 tasks with identical instructions. Stagehand used its default DOM mode (v3.2).

WebsiteSentinelStagehandCost (S)Cost (SH)
Amazon.de✓ 3 steps✓ 6 actions$0.003~$0.03
npmjs.com✓ 5 steps✓ 10 actions$0.003~$0.03
Booking.com✓ 6 steps✗ 13 actions$0.003failed
Wikipedia DE✓ 4 steps✗ timeout$0.003failed
Durchblicker.at✓ 17 steps✗ 26 actions$0.018failed

Stagehand failed on Booking.com (couldn't extract prices), Wikipedia (cookie redirect crash), and Durchblicker (stopped at engine power selection after 26 actions).

MetricSentinelStagehand
Default modelGemini 3 FlashGemini / GPT-4o
Tokens per action2–5k29–51k
Cost per run~$0.002~$0.03–0.08
Cost / 1k runs~$2~$30–80
E2E success rate5/5 sites2/5 sites
fillForm(json)
Network intercept
TOTP / MFA
Planner model split
Click verification
Form intelligence
Self-healing
Parallel sessions✓ built-inManual
MCP Server
CLI
Custom LLM✓ (+Ollama)Partial
LicenseMITMIT

Extended comparison

Sentinel vs. Stagehand, BrowserUse (Python), and plain Playwright — across the metrics that matter for production use.

MetricSentinelStagehandBrowserUsePlaywright
LanguageTypeScriptTypeScriptPythonTS / Python / Java
AI actions✗ (manual only)
Cost / run~$0.002~$0.08~$0.05$0
Self-healingPartial
Parallel built-inManualManual
Shadow DOMPartial
OpenTelemetry
MCP Server
CLI
Playwright compat✓ drop-inPartial✓ native
LicenseMITMITMITApache-2.0

BrowserUse costs estimated from GPT-4o at default settings. Plain Playwright has no AI cost but requires manual selector maintenance.

Methodology

  • Sites: Amazon.de, npmjs.com, Booking.com, Wikipedia DE, Durchblicker.at
  • Model: gemini-3-flash-preview (same for both)
  • Date: April 11, 2026
  • Sentinel: v4.1.0, default settings
  • Stagehand: v3.2, DOM mode (default), same API key
  • Instructions: identical goal text for both frameworks
  • Machine: same machine, sequential execution
  • Cost: token usage x official Gemini API pricing
typescriptbenchmark-task.ts
// Same instructions, same model (Gemini Flash), same machine.
// Both frameworks tested on 5 real websites, April 11, 2026.

// Sentinel
const sentinel = new Sentinel({ apiKey: process.env.GEMINI_API_KEY });
await sentinel.init();
await sentinel.goto('https://www.amazon.de');
const result = await sentinel.run(
  'Search for "mechanical keyboard", extract the first 3 product names and prices'
);
// 3 steps, $0.003, extracted: [{name, price}, ...]

// Stagehand (same model: Gemini Flash)
const stagehand = new Stagehand({
  env: 'LOCAL',
  model: { modelName: 'google/gemini-3-flash-preview', apiKey },
});
await stagehand.init();
const result = await stagehand.agent().execute({
  instruction: 'Search for "mechanical keyboard", extract first 3 products',
  maxSteps: 10,
});
// 6 actions, ~$0.03 (6-10x more tokens per action)

Prices based on official Gemini API pricing as of April 2026. Both frameworks tested with the same model (gemini-3-flash-preview) and default configurations. Stagehand v3.2 used DOM mode (their current default). Test code available on GitHub.

Real-world costs are even lower

The $0.002/run baseline assumes cold LLM calls. With locatorCache and promptCache enabled, repeated runs skip the LLM entirely — reducing effective cost by 60–90%on stable pages. In a typical test suite where most pages don't change between runs, the real cost is closer to $0.0002–$0.0008/run.

Cost / Operations Calculator

Run the numbers.

Adjust runs and per-run costs to match your setup.

$780saved vs Stagehand / mo
10,000
100500k

Cost per run — adjust to your pricing

Sentinel$0.0020/run
BrowserUse$0.050/run
Stagehand$0.080/run
Monthly cost — comparison
Sentinel$0.0020 × 10,000
$20
BrowserUse$0.050 × 10,000
$500
Stagehand$0.080 × 10,000
$800

Sentinel 40× cheaper than Stagehand at these settings. Defaults: $0.0020/run (Gemini Flash) vs $0.080 (GPT-4o) vs $0.050 (BrowserUse). With locatorCache + promptCacheenabled, Sentinel's effective cost drops further on repeated tasks.