← IAM Ideas
IAM Ideas 2026-05-11

Iiq Provisioning Recovery Console

An operator console that turns SailPoint IdentityIQ's scattered failed‑provisioning state into a single triage surface — classified, age‑ranked, recoverable, and audit‑logged.

Iiq Provisioning Recovery Console

IIQ Provisioning Recovery Console

An operator console that turns SailPoint IdentityIQ's scattered failed‑provisioning state into a single triage surface — classified, age‑ranked, recoverable, and audit‑logged.

Field Value
Type App
Theme Recover (primary, NIST CSF) + Automation (secondary)
Platform SailPoint IdentityIQ 8.4
Date 2026-05-11
Status Concept (browser-runnable mock, synthetic data)

The problem (in IIQ-shaped form)

When provisioning fails in IIQ, the evidence lands in three places that do not talk to each other:

  1. ProvisioningTransaction (PTO) objects — one per attempted operation per target system, with status = Success | Failed | Retry | Pending and a free-text statusMessage. Configured via enableProvisioningTransactionLog and provisioningTransactionLogLevel = Failure (see WEB-INF/config/init.xml, lines 2109–2111).
  2. IdentityRequest rows — the user-facing access request, with executionStatus = Verifying | Executing | Terminated | Completed. Failures roll up as messages, but the linkage from a specific Failed PTO back to the IdentityRequest is a click-through chain, not a query.
  3. Workflow case — the BPM case for LCM Provisioning carries isProvisioningFailed and the retry loop controlled by enableRetryRequest (see WEB-INF/config/lcmworkflows.xml, lines 255–257). A stuck case sits in the Process Monitor with no aggregated view of what failed across the fleet.

OOB IIQ surfaces these in three different grids — the Provisioning Transactions debug page, the Identity Request list, and the Process Monitor. There is no single answer to the question an IAM operator actually asks at 9 AM:

"What's broken in provisioning right now, why, who's affected, and what's the fastest recovery?"

This console answers that question.


What the console shows

Top KPI bar (5 tiles):

  • Open — total non-Success ProvisioningTransactions
  • Auto‑retry eligiblestatus = Retry, ready for the retry loop
  • Past SLA> 24h old and still not resolved (configurable)
  • Hot app — application carrying the largest share of open failures
  • Terminated identities affected — failures where Identity.inactive = true (the highest‑risk subset, because it likely means a Leaver event didn't fully execute)

Filter rail (left):

  • Application multi-select (AD, Okta, Workday, ServiceNow, Salesforce, GitHub Enterprise, SAP HR, …)
  • Status: Failed / Retry / Pending / Manual
  • Failure category — derived from statusMessage by a heuristic classifier:
    • Connector unreachable (Connection timed out, Host unreachable)
    • Auth failure (401, Invalid credentials, expired token)
    • Schema / attribute validation (Invalid attribute, required field, enum mismatch)
    • Role / policy precondition (SoD violation, role requires)
    • Form pending (Provisioning form awaiting input)
    • Manual work item (Routed to manual fulfillment)
    • Duplicate (already exists, 409)
    • Other
  • Age bucket: < 1h / 1–4h / 4–24h / > 24h
  • Identity type: Active / Terminated

Failure grid (center):

Time Identity App Operation Native identity Category Status Retries Message
2026-05-11 08:42 jdoe AD - NA Modify CN=jdoe,OU=… Connector unreachable Retry (3/5) 3 Connection timed out after 30s

Click a row → right-side drawer opens with:

  • Full statusMessage
  • The compiled provisioning plan diff (what was supposed to change)
  • The parent IdentityRequest ID and executionStatus
  • The parent workflow case ID
  • Recovery actions: Retry now, Mark abandoned (reason required), Escalate to manual, Send to retry queue with backoff, Open in IIQ

Bulk select supports the same actions across many PTOs.

Recovery Audit Log (bottom):

Every action the operator takes is recorded in a chronological panel:

2026-05-11 09:14:02  mike.s   retry-now      PTO-8842  (Connector unreachable)  reason: connector restored 08:58
2026-05-11 09:14:05  mike.s   retry-now      PTO-8841  (Connector unreachable)  reason: connector restored 08:58
2026-05-11 09:18:30  mike.s   mark-abandoned PTO-8801  (Schema validation)      reason: app retired in CMDB

This is the Recover artifact — the defensible trail of what was triaged, restored, abandoned, and learned during an incident.


Why this matters

Pain Today With this console
9 AM stand-up: "What's broken in provisioning?" Three browser tabs, manual cross-reference, gut-feel triage One screen, KPI tiles, category heatmap
A connector outage drops 400 PTOs into Retry Operator clicks each row in the debug page One filter, multi-select, one "Retry now"
Auditor asks for evidence of how a Q1 outage was handled Mailbox archaeology + screenshots Recovery Audit Log export, signed by timestamp + operator
Failed Leaver leaves a terminated user with live access Discovered weeks later, if at all KPI tile: Terminated identities affected — visible from minute one
Same statusMessage recurs across applications, signaling a config drift Hard to spot in raw log Failure-category heatmap — drift becomes a contiguous column of red

NIST CSF — Recover (primary)

  • RC.RP-1 Recovery plan is executed during or after an event — the audit log is the executed plan.
  • RC.IM-1 Recovery plans incorporate lessons learned — categorized failure data lets the team update connectors, retry policies, and role definitions.
  • RC.IM-2 Recovery strategies are updated — the heatmap surfaces systemic vs. one-off failures.
  • RC.CO-3 Recovery activities are communicated — bulk reason fields and the export feed comms templates.

Automation (secondary)

  • Heuristic classifier reduces categorization clicks.
  • Bulk actions reduce per-row clicks (400 PTOs → 1 action).
  • Retry queue with backoff means the operator doesn't babysit transient outages.

What's in this folder

File Purpose
index.html Single-page console — open in any modern browser, no build, no network.
style.css Operator-console aesthetic — zinc-950 background, emerald accents, dense typography.
script.js Renders the grid, KPIs, filters, drawer, and audit log from sample-data.json. Includes the failure-category classifier.
sample-data.json 42 synthetic ProvisioningTransactions across 7 IIQ applications and 18 identities (3 terminated, 15 active).
requirements.md Functional requirements, IIQ object model, out-of-scope.
metadata.md Provenance, model, IIQ pain points targeted, NIST CSF mapping.
cover-image.png 16:10 concept art (nanobanana).

How to run

Double-click index.html. That's it. No server, no API key, no IIQ instance required — the data is synthetic and lives in sample-data.json.

If this were wired into a real IIQ 8.4 deployment, the data layer would be:

  • Read ProvisioningTransaction objects via the SailPoint API:
    • GET /identityiq/rest/provisioningTransactions?filter=status.in("Failed","Retry","Pending")
    • Optionally use the iiq search index for free-text search on statusMessage.
  • Read parent IdentityRequest via IdentityRequest.getId() from the PTO.
  • Write recovery actions back as:
    • Retry → call Provisioner.retry(pto.getId()) from a custom Rule, or relaunch the parent workflow case.
    • Mark abandoned → set a custom IIQ_RECOVERY_STATE = "abandoned" on the PTO via setAttribute, then prune via the next Provisioning Transaction Pruner task.
    • Audit log → write to a dedicated IIQRecoveryAuditLog SailPointObject (custom class) or to the OOB AuditEvent table with action = "provisioning_recovery_*".

The examplerules.xml and lcmworkflows.xml files under Resources/IIQ_Repo_V8.4/WEB-INF/config/ show the BeanShell shapes for the rules and workflow steps that would back this UI.


Sources

  • LocalMy-Library/Apps/IAM-Ideas/Resources/IIQ_Repo_V8.4/WEB-INF/config/init.xml (PTO logging config), lcmworkflows.xml (LCM Provisioning workflow + retry loop), workflowRules.xml, authorization.xml (PTO authz scopes), tasksCommon.xml.
  • LocalMy-Library/Apps/IAM-Ideas/Resources/IIQ_Documentation/8.4/identityiq-doc-8.4/ (8.4 subject PDFs, esp. Provisioning, Lifecycle Manager, Tasks).
  • Webserper MCP server was unavailable this run (HTTP 400 from google.serper.dev); regenerate next run for current SailPoint Compass / community discussion on PTO triage and recovery patterns.
Requirements

Requirements — IIQ Provisioning Recovery Console

Functional requirements

F1. KPI bar

  • Display 5 tiles, computed from the loaded ProvisioningTransaction set:
    • Opencount(status != "Success")
    • Auto‑retry eligiblecount(status == "Retry")
    • Past SLAcount(ageHours > 24 && status != "Success") (SLA is 24h, fixed in the mock)
    • Hot appapplication with the largest count(status != "Success")
    • Terminated identities affectedcount(distinct identityName where identity.inactive == true && status != "Success")
  • Each tile is keyboard-focusable and shows a sparkline of the last 7 days when present in the data.

F2. Filter rail

  • Application — multi-select chip list, sourced from distinct application in the data.
  • Status — toggle row: Failed / Retry / Pending / Manual.
  • Failure category — multi-select chip list, sourced from the classifier output (see F4).
  • Age — 4 buckets: <1h, 1–4h, 4–24h, >24h.
  • Identity type — Active / Terminated toggle.
  • All filters AND together; "Reset filters" clears them.

F3. Grid

  • Columns: Time (relative + absolute on hover), Identity (displayName + name), Application, Operation (Create / Modify / Delete / Enable / Disable / SetAttribute), Native identity (truncated, full on hover), Category (badge), Status (badge), Retries (n/max), Message (truncated 120 chars).
  • Sortable by Time, Application, Status, Retries, Category.
  • Row click opens the right-side drawer.
  • Multi-select via checkbox + shift-range; multi-select reveals the bulk-action toolbar.

F4. Failure-category classifier

  • Pure function classifyMessage(statusMessage): Category implemented client-side.
  • Categories: Connector unreachable, Auth failure, Schema/Attribute, Role/Policy, Form pending, Manual work item, Duplicate, Other.
  • Heuristic: lowercase scan + first-match wins. Patterns are illustrative, not exhaustive:
    • Connector unreachableconnection refused, timed out, unreachable, no route to host
    • Auth failure401, 403, invalid credentials, unauthorized, expired token
    • Schema/Attributeinvalid attribute, required field, not a valid enum, must match
    • Role/Policysod violation, policy violation, requires role, precondition
    • Form pendingprovisioning form, awaiting input
    • Manual work itemmanual fulfillment, routed to, work item created
    • Duplicatealready exists, 409, duplicate
    • else → Other.

F5. Detail drawer

  • Triggered by row click; closeable via Esc or close button.
  • Shows: full statusMessage, the provisioning plan diff (synthetic, displayed as a 3-line JSON-ish change set), IdentityRequest.id + executionStatus, parent workflow case ID, retry history (n entries with timestamps), recovery actions (5 buttons — see F6).

F6. Recovery actions (single + bulk)

  • Retry now — sets status = "Retry", increments retryCount, appends an audit row.
  • Mark abandoned — opens a modal requiring a non-empty reason; on confirm sets a virtual IIQ_RECOVERY_STATE = "abandoned" and removes the row from the open grid.
  • Escalate to manual — sets status = "Manual", audit row captures the assignee (defaults to current user).
  • Send to retry queue with backoff — sets status = "Retry" but suppresses the row for 1h (visual only) and audits the deferral.
  • Open in IIQ — visual no-op in the mock; in production would deep-link to /identityiq/identityRequest.jsf?id=<id>.

F7. Recovery Audit Log

  • Bottom panel, scrollable, always visible.
  • Each entry: timestamp operator action pto-id category reason.
  • New entries highlight briefly when added.
  • "Export CSV" button downloads the visible log.

F8. Theme

  • Dark by default (zinc-950 / emerald). Light toggle in the header (zinc-50 / forest-green).
  • Respects prefers-color-scheme on first load.

Non-functional requirements

  • N1. Opens directly from disk — no server, no API, no network calls.
  • N2. No build step. Plain HTML + CSS + JS, ES2020. No framework, no transpiler.
  • N3. No external fonts or assets at runtime (uses system font stack).
  • N4. Renders ≥ 500 PTOs at 60fps on a 2020-era laptop (current sample is 42).
  • N5. All identity data is synthetic; no resemblance to real persons is intended.
  • N6. Accessible: keyboard-navigable filters and grid; ARIA labels on KPIs and badges.

IIQ object model assumptions

Object Field Shape
ProvisioningTransaction id string (e.g. PTO-08842)
operation enum: Create Modify Delete Enable Disable SetAttribute
status enum: Failed Retry Pending Manual
statusMessage free text from the connector
application string app name
identityName string (the IIQ Identity name)
identityDisplayName string
identityInactive boolean (joined from Identity for the KPI)
nativeIdentity string (DN, sAMAccountName, email, etc.)
retryCount int
maxRetries int (config; mock uses 5)
created ISO-8601
identityRequestId string (e.g. IR-12340)
executionStatus enum: Verifying Executing Terminated Completed
workflowCaseId string (e.g. WC-87654)
planDiff array of {op, attr, from, to}

Out of scope

  • Live SailPoint API calls.
  • Persistence beyond page lifetime (audit log is in-memory; refreshing the page resets).
  • Multi-tenant scoping / SPRight enforcement.
  • Mobile / narrow-viewport layout (operator console assumes ≥ 1280px wide).

More from IAM Ideas