Patch Management

Governed patch deployment from policy to completion

Define policies, schedule maintenance windows, configure ring-based rollout, and monitor deployments through canary validation, variance detection, and circuit breaker protection. This is the most detailed operational guide in the manual.

Technical Manual
Status: Available

Prerequisites

  • Role with patches.view permission for read-only access
  • Role with patches.manage permission for creating policies, ring sets, deployments, and approving/rejecting patches
  • Role with patches.deploy permission for skipping individual deployment hosts
  • Role with hosts.view / hosts.manage for service group management
  • At least one organization with managed hosts that have agents installed and reporting heartbeats
  • Hosts must be sending pending patch data via host-info collection (automatic with agent v1.7.0+)

Creating a patch policy

Patch policies control auto-approval rules, safety thresholds, and pre-flight behavior. Policies are org-scoped -- one active policy per organization.

  1. Navigate to Patch Management > Policies
  2. Click Create Policy
  3. Select the target organization
  4. Configure the fields below
  5. Click Save

Policy fields

Auto-Approve SecurityEnable to automatically approve security patches
Auto-Approve CriticalEnable to automatically approve critical patches
Auto-Approve FeatureEnable to automatically approve feature/hotfix/driver patches
Auto-Approve Delay (Days)Days a patch must be publicly available before auto-approving. Set > 0 for cautious organizations.
Minimum Disk Space (GB)Pre-flight blocks hosts below this threshold (default: 10 GB)
Global Failure ThresholdNumber of failures across the account before auto-rejecting a KB via circuit breaker (default: 3)
Snapshot Before InstallEnable to create a VM snapshot before installing patches
Block Untested PatchesEnable to pause deployments at higher rings when patches were not tested in lower rings (variance detection)

Policy overrides

Overrides let you customize policy settings for specific locations or host groups without creating entirely separate policies. Overrides use sparse merge -- only fields you specify override the base policy. All other fields inherit from the base.

  1. Open the policy detail page
  2. Click Add Override
  3. Select the target: a location or host group
  4. Set only the fields you want to override (leave others blank to inherit)
  5. Set the priority number -- higher priority wins when multiple overrides apply to the same host
  6. Click Save
Effective policy resolution When the engine evaluates a host, it merges: base policy → location override → host group overrides (ordered by priority). Only non-null fields from overrides replace the base value.

Creating maintenance windows

Maintenance windows define when patching and reboots are permitted. Hosts configured with the "Maintenance Window" install schedule will only begin patching inside an active window.

  1. Navigate to Maintenance Windows
  2. Click Create Window
  3. Enter a name and select the organization
  4. Choose the schedule type
  5. Set the start and end times (24-hour format)
  6. Set the timezone (e.g., America/New_York) -- all MW times are evaluated in this timezone
  7. Add targets: select host groups or individual hosts
  8. Click Preview Schedule to verify the next 3 occurrences
  9. Click Save

Schedule types

TypeDescriptionExample
Nth DayThe Nth occurrence of a weekday each month"3rd Sunday of every month"
RelativeA weekday relative to another weekday occurrence"First Saturday after the 2nd Sunday"
Last DayThe last occurrence of a weekday each month"Last Friday of every month"
Timezone matters A maintenance window set to "America/New_York" evaluates start/end times in Eastern Time, including DST transitions. If your hosts are in multiple timezones, create separate windows per timezone.

Creating ring sets and rings

Ring sets define the phased rollout structure for patch deployments. Each ring set contains ordered rings that deploy sequentially.

  1. Navigate to Patch Management > Ring Sets
  2. Click Create Ring Set
  3. Enter a name (e.g., "Linux Servers" or "Windows Workstations")
  4. Toggle Auto-deploy if approved patches should automatically create deployments
  5. Optionally set a classification filter to limit to security-only, critical-only, etc.
  6. Optionally configure script hooks: pre-script, post-script, synthetic test script
  7. Click Save

Adding rings

Add rings in deployment order (ring 0 deploys first). A typical three-ring setup:

RingNameCanaryWaitSuccess Gate
0Canarycount=24 hours100%
1Early Adopters--24h cooloff95%
2Production----95%

For each ring, add members: select host groups, individual hosts, or service groups.

Ring configuration

Install schedule modes

Each ring has an install schedule that controls when patching begins for hosts in that ring.

ModeBehavior
ImmediatePatching begins immediately when the ring activates. No maintenance window check.
Delay from ApprovalWait N days from deployment creation. After delay, optionally gate on maintenance window if enabled.
Delay from Prior RingWait N days from the prior ring's completion. After delay, optionally gate on maintenance window.
Maintenance WindowWait for host to be within an active maintenance window. The scheduled time is set to the next window start time. This is the default.
Out-of-band deployments Deployments marked as "Out of Band" bypass all scheduling modes and start immediately.

Reboot policy

Each ring has its own reboot policy controlling what happens after patches install:

immediateReboot as soon as patches finish installing
maintenance_windowSchedule reboot for the next maintenance window
manualDo not reboot automatically -- operator must reboot manually
suppressSuppress reboot entirely (not recommended for most patches)

Canary settings

Canary CountNumber of hosts to deploy to first as canaries (or use Canary Percentage for percentage-based selection)
Canary Wait (Hours)Hours to wait after all canaries complete before proceeding to remaining ring hosts
Cooloff (Hours)Hours to wait after all ring hosts complete before advancing to the next ring
Success Gate (%)Minimum success rate required to advance. If not met, deployment pauses.

Service groups

Service groups enable coordinated patching of dependent servers in multi-tier applications (e.g., Web → App → DB). Tiers patch in dependency order to avoid breaking service dependencies.

  1. Navigate to Service Groups in the Policy sidebar section
  2. Click Create Service Group
  3. Enter a name (e.g., "Production ERP Stack") and select the organization
  4. Add tiers in dependency order (tier 0 patches first)

Tier configuration

Each tier references a host group -- the hosts in that group become the tier's members.

Host GroupThe host group whose members form this tier
Tier OrderExecution order (0 = first). Lower numbers patch first.
Max ConcurrentMaximum hosts patching simultaneously within this tier
Success Gate (%)Minimum success rate to advance to the next tier (default: 100%)
Pre-ScriptOptional drain/shutdown script (overrides deployment-level pre-script for this tier)
Post-ScriptOptional startup/health-check script (overrides deployment-level post-script for this tier)

Example tier setup

TierNameHost GroupMax ConcurrentSuccess Gate
0Web ServersWeb Servers2100%
1App ServersApp Servers1100%
2DatabaseDB Servers1100%

Add the service group as a ring member (same as adding a host group). During deployment:

  • Service group hosts are never canary -- tier ordering IS the validation mechanism
  • Tier 0 completes first, then tier 1, etc.
  • If a tier fails its success gate, the deployment pauses
  • The Max Concurrent setting controls how many hosts patch simultaneously within a tier

Managing available patches

As agents report pending patches, they appear in Patch Management > Available Patches. Patches matching auto-approve rules are approved automatically (shown as "System" in the approved-by column). Others stay in "Pending" status for manual review.

Manual actions

ActionEffect
ApproveMarks the patch as approved for deployment
RejectRejects the patch with a required reason. It will not be deployed.
DeferDefers the patch until a specified date. It returns to pending after the deferral expires.
Bulk ApproveApprove multiple patches at once
Bulk RejectReject multiple patches at once

Creating a deployment

If auto-deploy is enabled on a ring set, deployments create automatically when patches are approved. For manual deployments:

  1. Navigate to Patch Management > Deployments
  2. Click Create Deployment
  3. Select the ring set
  4. Choose patches to include
  5. Configure overrides if needed (max host retries, max duration, pre-download, script hooks)
  6. Click Save -- deployment starts in Draft status
  7. Click Start Deployment

Deployment options

Pre-DownloadEnable to pre-download patches before the maintenance window
Max Host RetriesHow many times to auto-retry a failed host (default: 1, set 0 for no retries)
Max Duration (Hours)Auto-cancel if deployment runs longer than this (default: 72h, leave blank for no limit)
Paused Escalation (Hours)Alert + auto-cancel if paused longer than this (default: 24h, leave blank for no limit)
Pre-ScriptScript to run before patching each host
Post-ScriptScript to run after patching each host
Synthetic Test ScriptScript for synthetic validation testing after patch install
Out of BandEnable to bypass all scheduling and start immediately

Deployment lifecycle

When you start a deployment, the engine processes it through the following phases:

  1. Ring 0 expansion: The engine expands ring 0 members into individual host rows, skipping empty rings automatically
  2. Canary selection: The first N hosts are marked as canaries
  3. Canary patching: Canary hosts begin the per-host pipeline immediately
  4. Canary wait: After all canaries complete, the system waits for the configured canary wait period
  5. Remaining ring hosts: Non-canary hosts begin patching (respecting install schedule mode)
  6. Success gate check: When all ring hosts finish, the success rate is evaluated against the configured success gate percentage
  7. Cooloff: The system waits for the configured cooloff period before advancing
  8. Variance detection: For ring N > 0, if patches exist that were not tested in any previous ring and "Block Untested Patches" is enabled, deployment enters "Variance Approval Required" status
  9. Next ring: Steps 1-8 repeat for each subsequent ring
  10. Completion: When all rings finish, deployment moves to "Completed" (or "Completed with Failures" if any hosts failed/timed out/were skipped)

Per-host pipeline

Each host goes through this sequence (phases are driven by job completion callbacks):

PhaseHost StatusJob TypeNotes
Pre-downloadDownloadingPatch downloadOptional. Downloads patches before MW.
MW enforcementScheduled--Waits for MW if required by ring schedule
Pre-flightPre-flightPre-flight checkCaptures system state, checks disk space
SnapshotSnapshotVM snapshotOptional. VM snapshot before patching.
Pre-scriptPre-scriptScriptOptional. Custom pre-patch script.
InstallInstallingPatch installOS-specific patch installation
Post-scriptPost-scriptScriptOptional. Custom post-patch script.
Post-flightValidatingPost-flight checkCaptures post-install state for diff
State diff----Compares pre/post fingerprints
Synthetic test--ScriptOptional. Application validation.
DoneCompleted--Triggers ring advancement check

Monitoring deployment progress

The deployment detail page shows real-time status across all rings and hosts.

Deployment statuses

DraftCreated but not started
RunningActively processing rings and hosts
PausedPaused due to canary failure, success gate failure, or manual pause
Variance Approval RequiredHigher ring has patches not tested in lower rings
CompletedAll hosts completed successfully
Completed with FailuresDeployment finished but some hosts failed, timed out, or were skipped
CancelledManually or automatically cancelled

Host statuses to watch for

BlockedPre-flight detected a blocking condition (e.g., low disk space). System monitors and auto-retries when resolved. Fails permanently after 24 hours.
Timed OutHost did not report back within phase timeout. Orphan recovery may re-fire callback if agent later reports success.
FailedPipeline phase failed. Auto-retry may occur if retries remain.
SkippedManually skipped or auto-skipped (no MW configured for 48+ hours)

Variance detection

If a higher ring has patches that were not tested in any previous ring and the policy has "Block Untested Patches" enabled, the deployment enters "Variance Approval Required" status. Click Approve Variance to continue, or disable "Block Untested Patches" in the policy.

Deployment actions

ActionWhen availableEffect
PauseRunningPauses the deployment. In-progress hosts finish their current phase. No new hosts start.
ResumePausedResumes processing from where it stopped.
CancelRunning, PausedCancels the deployment with optional reason. In-progress hosts finish their current phase. Pending hosts are skipped.
RedeployCancelledResets to Draft status. All host rows are deleted and recreated on start.
Approve VarianceVariance requiredApproves untested patches for the current ring and resumes deployment.
Skip HostAny non-terminal hostSkips a single host with optional reason. Requires patches.deploy. Allows the ring to advance.
Retry HostFailed hostResets the host to "Pending" and restarts the full pipeline from scratch.
Rollback HostCompleted or failed hostCreates a patch uninstall job that reverses patches in reverse install order. Host transitions through "Rolling Back" to "Rolled Back".

Auto-retry on failure

When a host fails during any pipeline phase, the system automatically retries up to the configured Max Host Retries (default: 1).

  • Retry resets the host to "Pending" and restarts the entire pipeline from scratch (download, preflight, install, etc.)
  • All per-run state (fingerprints, jobs, error messages) is cleared
  • Retries are tracked and visible on the host detail
  • Auto-retry does NOT apply to rollback failures (prevents infinite loops)
  • The circuit breaker still fires for install-phase failures even during retries

Circuit breaker

The circuit breaker protects against widespread failures from a bad patch. When a patch KB fails on 3+ hosts across the entire account (default threshold), the system auto-rejects that KB in all organizations.

  • Threshold is configurable per policy via the Global Failure Threshold setting
  • Counts failed and rolled-back hosts for the same KB across all account organizations
  • Once triggered, the KB is rejected and no further deployment attempts are made
  • To re-deploy after fixing the root cause, manually re-approve the KB

Deployment auto-cancellation

Two automatic cancellation mechanisms protect against stuck deployments:

MechanismDefaultBehavior
Max duration72 hoursDeployments running longer than the configured Max Duration are auto-cancelled. Leave blank to disable.
Paused escalation24 hoursDeployments paused longer than the configured Paused Escalation time trigger an alert + notification, then auto-cancel. Leave blank to disable.

MW auto-skip

Hosts using the "Maintenance Window" install schedule that have been in "Pending" status for over 48 hours with no maintenance window configured are automatically skipped. This prevents misconfigured hosts from blocking an entire ring indefinitely.

Post-deployment

Completion summary

When all rings complete, the deployment builds a completion summary including per-ring success rates, failed host details, and total patches installed. The deployment status is:

  • "Completed" -- all hosts succeeded
  • "Completed with Failures" -- some hosts failed, timed out, or were skipped

Reboot handling

Reboot behavior is per-ring. After patches install, the reboot policy determines when the host reboots. If set to "Maintenance Window", the reboot is scheduled for the next MW. If set to "Manual", the operator is responsible for rebooting.

Deployment lifecycle flow

Discover Patches Approve / Reject Create Deploy Start Ring 0 Canary Hosts Canary Wait Ring Hosts Success Gate Cooloff Period Variance Check (if enabled) Next Ring (or Complete) Download Pre-flight Snapshot Pre-script Install Post-script Post-flight Validate Complete PER-HOST PIPELINE Optional phases (download, snapshot, pre/post-script, validate) are skipped if not configured

Permissions reference

PermissionGrants
patches.viewView policies, ring sets, available patches, deployments, maintenance windows
patches.manageCreate/update/delete policies, ring sets, deployments. Approve/reject patches. Start/pause/cancel deployments. Manage maintenance windows.
patches.deploySkip individual deployment hosts
hosts.viewView service groups
hosts.manageCreate/update/delete service groups and tiers

Troubleshooting

SymptomCauseFix
Patches not appearingAgent not sending pending patch dataCheck agent version is 1.7.0 or later, verify heartbeat is active
Auto-approve not workingNo active policy, or classification mismatchVerify the policy is enabled, check classification match
Host stuck in "Pending"Not in maintenance window, or canary gate not clearedCheck MW targeting. Check canary status on deployment detail.
Host stuck in "Scheduled"MW has not opened yetCheck the scheduled time on the host detail. Verify MW schedule with preview.
Deployment stuckRing hosts in transitional stateEngine tick is 300s. Check host error messages. Orphan recovery runs automatically.
"Variance Approval Required"Higher ring has untested patchesApprove variance on deployment detail, or disable "Block Untested Patches" in the policy
Circuit breaker triggered3+ failures for same KB across accountReview failure reasons. Manually re-approve KB after fixing root cause.
Host blockedPre-flight detected condition (disk space)Resolve the condition. System auto-retries when alert resolves.
MW times wrongTimezone mismatchVerify the maintenance window timezone matches the expected timezone
Host stuck in "Downloading"Download job completed but callback missedOrphan recovery detects this. Wait for the next engine cycle (approximately 5 minutes).
Host auto-skipped after 48hNo maintenance window configuredAssign a maintenance window to the host's group, or use the "Immediate" install schedule
"Completed with Failures"Some hosts failed/timed out/skippedReview failed hosts on deployment detail. Retry or investigate.
Service group tier not advancingPrior tier incomplete or failed success gateCheck prior tier hosts. All must be in terminal state.
"Deployment auto-cancelled"Exceeded Max Duration or Paused Escalation time limitIncrease limits, or investigate why it is taking too long / why it is paused.