← App Prototypes
App Prototypes 2026-05-31

Agent Leash

Agent Leash
Prototype

AgentLeash

The control plane for your autonomous AI shopping agents — set budget caps, approval thresholds, and blocked categories once, then watch every agent purchase get approved, held, or blocked against your policy in real time.

Date: 2026-05-31 Form factor: Web app (single-page; mobile-friendly) Status: Prototype

What it is

AgentLeash sits between your AI shopping agents and your card as a policy gate. You connect the agents already buying on your behalf in 2026 — ChatGPT Instant Checkout, Perplexity Comet, Amazon Rufus, Google's UCP checkout — set a handful of household rules, and AgentLeash evaluates every purchase attempt the instant it happens: approve it, hold it for your one-tap sign-off, or block it outright. The result is a single dashboard where you can finally see, cap, and audit what your agents are spending.

The prototype demonstrates the end-to-end flow on one household's day: 17 real-world agent purchase attempts streaming in across four agents, scored live by an in-browser policy engine you can re-tune with sliders and watch every decision recompute.

Who it serves

Households and individuals who have started letting AI agents buy things for them — a fast-growing group in 2026, when 45% of consumers already use AI for part of the buying journey and 70% are at least somewhat comfortable with an agent purchasing on their behalf (IBM IBV, NRF 2026). The specific pain: handing an agent a real card number means the number lives in the agent's config with no built-in spending cap — if it makes a mistake, gets prompt-injected, or enters a processing loop, there is no automatic safeguard. Concrete personas:

  • The busy parent who let ChatGPT and Amazon Rufus handle grocery and household refills and now has no single place to see the total or stop a runaway reorder.
  • The early adopter running three or four agents at once who wants electronics and travel purchases over $75 to pause for a human glance, while $40 grocery baskets just clear.
  • The security-conscious user who has read about agent prompt-injection and wants cash-equivalent buys (gift cards), gambling, and brand-new subscriptions blocked or held by default.

Why it could be profitable

Monetization is freemium consumer SaaS with a family tier and a B2B card-issuer API:

  • Free: One connected agent, a monthly cap, and the decision feed. No account required for the demo.
  • Pro ($8/mo): Unlimited agents, per-category and per-merchant rules, the approval queue with push/SMS alerts, velocity limits, new-subscription guard, and audit-log export.
  • Family ($14/mo): Per-member sub-budgets, kid/teen agent scopes, shared approval inbox.
  • B2B issuer API ($0.02–0.05 per evaluated transaction, or rev-share): Banks, neobanks, and card issuers embed AgentLeash as the "agent spend control" layer their customers will demand the moment agentic checkout is default. This is the larger pool — every issuer needs a guardrail story for delegated agent payments, and most have none.

The timing is the whole thesis. Google announced the Universal Commerce Protocol at NRF on January 11, 2026, standardizing AI-completed checkout across merchants; Juniper Research (April 2026) projects ~$8B in agentic commerce spend in 2026, scaling toward $1.5T by 2030. The agents shipped first; the guardrails did not. Privacy.com and Oracle both flagged in 2026 that agents handed a raw card have no spend ceiling and no automatic stop. AgentLeash is the missing control plane for the exact moment delegated spending goes mainstream — and consumer trust is the bottleneck (only 24% trust AI recommendations outright), which is precisely what a visible, enforceable guardrail unlocks.

Form factor & scope

Single-page web app, sized for mobile and desktop. Scope-locked to the policy + audit layer — AgentLeash does not itself broker payments, issue cards, or run the agents; it is the decision gate and ledger that sits in front of them. The minimum viable scope demonstrated here:

  1. See your connected agents; pause any one (or freeze all) with a toggle.
  2. Set the policy — monthly cap, ask-first threshold, per-hour velocity limit, blocked categories, blocked merchants, new-subscription guard.
  3. Watch the live decision feed score all 17 purchase attempts in chronological order, each tagged Approved / Held / Blocked with the exact rule that fired.
  4. Work the pending-approval queue — approve or deny held items and watch the budget, KPIs, and feed update.
  5. See spend-by-agent and export a plain-text audit log.

How to run it

  1. Open index.html in any modern browser (Chrome, Firefox, Edge, Safari).
  2. Drag the Monthly cap, Ask-first, and Velocity sliders, toggle the blocked-category chips, pause an agent, or hit Freeze all — every decision in the feed and queue re-evaluates instantly.
  3. In the Pending approval queue, click Approve or Deny on a held purchase and watch the approved-spend total and KPIs move.
  4. Use Copy audit log / Download .txt to export the full policy + decision log.

No build step, no API keys, no accounts. Sample data is embedded inside index.html as a <script type="application/json"> block so the page works directly from file:// with no local server. A standalone copy of the same data also lives at sample-data.json in this folder.

What's in this prototype

  • A live policy engine (script.js) that evaluates each purchase in chronological order against: emergency freeze, paused-agent, blocked-merchant, blocked-category, per-agent velocity limit, new-subscription guard, ask-first threshold, and running monthly-cap — first rule to fire wins, with a human-readable reason.
  • Four modeled agents — ChatGPT Instant Checkout, Perplexity Comet, Amazon Rufus, and Google AI Mode (UCP, shipped paused) — each with its own scope note and a pause toggle.
  • 17 purchase attempts crafted to exercise every rule: routine groceries that clear, a $329 headphone buy held at the threshold, a DraftKings top-up blocked on category, a GiftCardVault buy blocked on merchant (the classic agent-exploit pattern), a fourth consumable reorder in 30 minutes tripping the velocity limit, and a brand-new Spotify subscription held by the subscription guard — netting 10 approved, 4 held, 2 blocked at the default policy.
  • Editable policy controls — sliders for cap / threshold / velocity, toggle chips for blocked categories, and switches for the subscription guard and emergency freeze, all wired to instant re-evaluation.
  • A pending-approval queue with working Approve / Deny overrides that feed back into the budget math.
  • KPIs + budget bar (approved MTD, remaining, held, blocked-and-prevented) and a spend-by-agent breakdown.
  • A plain-text audit export (copy or download) capturing the active policy and every decision — the artifact you'd hand a bank or keep for your own records.

Roadmap

  • Real agent connections via the Universal Commerce Protocol and the major agent SDKs, replacing the sample feed with live purchase webhooks.
  • Push / SMS approval so a held purchase pings your phone and clears with one tap.
  • Per-member and per-agent sub-budgets, plus kid/teen scoped agents for the Family tier.
  • Anomaly detection beyond static rules — flag spend that deviates from each agent's learned pattern (merchant, time-of-day, amount).
  • A B2B issuer SDK that drops the same policy engine into a bank app as the "agent spend control" panel, billed per evaluated transaction.
  • Prompt-injection-aware blocking that cross-references known agent-exploit merchant and category patterns (cash-equivalents, freshly registered sellers).

Sources

Requirements

AgentLeash — Requirements

Goals

  • Give a household a single, trustworthy place to see and control everything its AI shopping agents are buying.
  • Turn an abstract worry ("my agent has my card and no spending limit") into a concrete, enforceable, auditable policy.
  • Make the policy legible: every decision shows the exact rule that produced it, in plain language.
  • Demonstrate the full approve / hold / block lifecycle on realistic data, with live re-evaluation as the policy changes.
  • Run with zero setup — open the file, see the system work.

Primary user

A household "agent administrator" — typically the person who connected the AI shopping agents in the first place. They are comfortable delegating routine purchases but anxious about runaway spend, exploited agents, and surprise subscriptions. Context of use: a quick daily glance on a phone, plus an occasional sit-down to tune rules or clear an approval queue. Job to be done: "Let my agents handle the small, routine stuff automatically, but make sure nothing big, weird, or risky clears without me."

Functional requirements

  • FR1: Load household, policy, agents, and transactions from embedded JSON, with a fetch fallback to sample-data.json.
  • FR2: Display each connected agent with platform, scope note, and a pause toggle; agents flagged not-connected start paused.
  • FR3: Evaluate every transaction in chronological order through an ordered rule chain and assign exactly one verdict: approved, held, or blocked.
  • FR4: Enforce hard blocks (non-overridable): emergency freeze, paused agent, blocked merchant, blocked category.
  • FR5: Enforce soft holds (queue-overridable): per-agent hourly velocity limit, new-subscription guard, ask-first amount threshold, monthly-cap ceiling.
  • FR6: Track a running approved-spend total and per-agent purchase velocity during evaluation so cap and velocity rules reflect prior decisions in the same day.
  • FR7: Expose editable policy controls — sliders for monthly cap, ask-first threshold, and velocity; toggle chips for blocked categories; switches for the subscription guard and emergency freeze — each triggering instant re-evaluation.
  • FR8: Provide a pending-approval queue listing every held item with its reason, plus working Approve / Deny overrides that feed back into budget and KPIs.
  • FR9: Render a live decision feed (newest first) with agent, merchant, item, amount, verdict badge, and the rule that fired; blocked items shown struck through.
  • FR10: Show KPIs — approved spend MTD, budget bar with warn/over states, remaining budget, held count + amount waiting, blocked count + amount prevented.
  • FR11: Show a spend-by-agent breakdown (approved purchases only) with proportional bars.
  • FR12: Generate a plain-text audit log of the active policy and every decision, with copy-to-clipboard and download-as-.txt.
  • FR13: Run entirely client-side from file:// with no build step, no network calls, and no API keys.

User stories

  • As a household agent admin, I want to set a monthly cap so my agents collectively can't overspend, so that one runaway agent can't drain my budget.
  • As a parent, I want purchases over a threshold to pause for my approval, so that big or unusual buys never clear silently.
  • As a security-conscious user, I want gift-card and gambling buys blocked by category, so that the classic agent-exploit patterns are stopped by default.
  • As an admin, I want a brand-new subscription to be held, so that agents can't quietly sign me up for recurring charges.
  • As an admin, I want a per-hour velocity limit, so that a prompt-injected or looping agent can't fire off dozens of orders.
  • As a user, I want an emergency "freeze all" switch, so that I can stop everything instantly if something looks wrong.
  • As a user, I want each decision to tell me why, so that I trust the system and can tune it.
  • As an admin, I want to clear the approval queue in one place, so that held purchases don't get lost.
  • As a record-keeper, I want to export an audit log, so that I can keep proof of what my agents did or share it with my bank.

Non-functional requirements

  • Performance: full re-evaluation and re-render on every control change in well under one frame for this data size; no perceptible lag.
  • Accessibility: semantic HTML, labeled controls, aria-pressed toggles, live region for copy/download status, keyboard-operable buttons and sliders.
  • Privacy: no data leaves the browser; no analytics, no third-party scripts, no external fetches.
  • Portability: must run from file:// and from a static host identically; assets are local only.
  • Resilience: malformed inline JSON falls back to sample-data.json; a load failure shows a visible error rather than a blank page.

Out of scope (for the prototype)

  • Real payment-network integration, card issuance, or actual purchase interception.
  • Real agent connections / UCP webhooks / agent SDK auth.
  • Accounts, multi-device sync, or persistence across reloads.
  • Push / SMS notifications.
  • Machine-learned anomaly detection (only static rules here).

Open questions

  • Where does the gate physically live in production — a virtual card with a programmable authorization webhook, or inside each agent's checkout SDK?
  • How are held purchases reconciled if an agent times out waiting for approval — auto-deny after N minutes, or retry?
  • For the issuer B2B model, is pricing per-evaluated-transaction, per-active-agent, or rev-share on prevented overspend?
  • How should "known recurring" be established on day one before AgentLeash has seen a billing cycle?

More from App Prototypes