Administration

Agent Management

Manage agent versions, deploy updates with wave-based rollouts, monitor rollout health, and maintain your software inventory. The Go agent supports zero-downtime updates via dual-process switchover.

Technical Manual
Status: Available

Prerequisites

  • User role with agent_updates.view (read) or agent_updates.deploy (deploy/cancel/rollback)
  • User role with software.view (read) or software.manage (write/deploy) for software library
  • Target hosts must be running the Go agent (not legacy), have an active agent, and not be in decommissioned status
  • Agent version must have an active or completed publication for your account

Understanding Agent Versions

SPOG runs two agent types. Only the Go agent supports automated updates.

AgentLanguageAuto-UpdateNotes
Go AgentGo 1.21+YesPrimary agent. Ed25519 signed. Zero-downtime update via dual-process switchover.
Legacy AgentCNoReports as legacy-X.Y.Z. Must be manually replaced. Excluded from rollouts.
Legacy agents cannot be updated remotely. They are excluded from all automated rollout mechanisms. Replace them with the Go agent to enable remote updates.

Agent Update Lifecycle

The update process uses a dual-process approach for zero-downtime updates. The old agent launches the new binary as a separate process, the backend verifies the new agent is healthy, and only then switches over.

Update State Machine

pending --> downloading --> downloaded --> starting_new --> health_checking --> switching --> completed \ \ \--> cancelled \--> rolled_back \--> failed (at any point)

What Happens at Each Stage

pendingJob created, waiting for agent to pick it up on next poll.
downloadingAgent is downloading the new binary from the backend.
downloadedBinary verified via SHA256 hash. Ready to launch.
starting_newOld agent launches new binary as a separate process with a unique instance ID.
health_checkingServer monitors new agent's heartbeats. Requires 3 consecutive healthy checks.
switchingNew agent promoted to primary. Shutdown job sent to old agent.
completedOld agent shut down. Host record updated with new version.
rolled_backHealth checks failed. New agent shut down, old agent restored as primary.
Automatic rollback. If the new agent fails health checks (or times out), the system automatically rolls back — the new process is killed and the old agent continues operating. No manual intervention needed.

Deploying Agent Updates (Manual)

  1. Navigate to Agent Updates > Versions to view available versions. Filter to stable versions only for production environments.
  2. Select a version and click Deploy. Choose the target hosts and set the priority.
  3. The version must have an active or completed publication for your account.
  4. Each host receives an update job. Hosts are automatically skipped if:
    • Already on the target version
    • Have a pending update in progress
    • No binary available for their OS/architecture
  5. The deployment summary shows which hosts received update jobs and which were skipped (with reasons).

Wave-Based Rollout (Publications)

Platform operators create publications that roll out a version across accounts in waves. Publications are read-only for tenants — you can view progress but not create or modify them.

Rollout Waves

The automated rollout engine runs every 60 seconds and processes active publications:

  1. Canary wave — small batch of hosts (typically 1-5%) to validate the update.
  2. Early adopter wave — broader batch, still limited.
  3. Broad wave — majority of hosts.
  4. Full rollout — remaining hosts.

Between waves, the engine evaluates the failure rate. If it exceeds the configured threshold, the publication is paused and requires operator intervention to resume.

Publication State Machine

draft --> active --> completed \ \--> paused --> active (resume) \ \--> cancelled \--> cancelled

Viewing Publications

Navigate to Agent Updates > Publications to view rollouts targeting your account. Each shows: version, status, and progress (total targets, updated, failed, in progress, current wave).

Monitoring Rollout Progress

Update Jobs

  • Navigate to Agent Updates > Update Jobs to view all update jobs. Filter by status or host.
  • Each job shows: source version, target version, current status, timestamps for each phase, and success/failure result.

Aggregate Statistics

  • The Update Statistics view shows success rate and counts by status.
  • The Version Distribution view shows host counts per agent version across your fleet.

Cancel or Rollback

ActionHowConstraint
CancelClick Cancel on an update jobOnly pending or queued updates. No agent-side action needed.
Force RollbackClick Rollback on an update jobCannot rollback a completed update (old agent already shut down).

Software Inventory

The software library is a global catalog for managing third-party software deployed to hosts.

Catalog Management

  1. Navigate to Software Library > Create Entry. Enter vendor, name, description, license type, and product family. The vendor + name combination must be unique.
  2. Add versions to the software entry. Marking a version as "Latest" automatically clears the flag from the previous latest version.
  3. Add installers per version, specifying: OS type, architecture, installer URL, SHA hash, silent install arguments, and whether a reboot is required.

Software Deployment

  1. Select a software version and click Deploy. Choose the target hosts and set the priority.
  2. Optionally select a specific installer to override auto-selection. Otherwise, the system automatically matches the installer to each host's operating system.
  3. Install jobs are created with automatic retry (2 retries, 5-minute delay between attempts).
  4. The agent downloads the installer and runs it silently with the configured install arguments.

License Tracking

  • Navigate to the software entry and click Add License to create a license record (scoped to an organization).
  • Track total licenses, used licenses, expiration date, and cost.
  • Available licenses are computed automatically (total minus used). Expired status is derived from the expiration date.

Permissions Reference

PermissionGrants
agent_updates.viewList/get versions, binaries, update jobs, publications, statistics.
agent_updates.deployDeploy versions to hosts, cancel pending updates, force rollback.
software.viewList/get software catalog, versions, installers, licenses.
software.manageCreate/update/delete software entries, versions, installers, licenses. Deploy software to hosts.

Troubleshooting

SymptomCauseFix
Version not visibleNo publication for your accountContact platform admin to publish the version to your account.
Deploy returns 403Version not published to accountVerify a publication exists and is active or completed for your account.
All hosts skippedNo binary for OS/architectureUpload a binary matching the target hosts' platform.
Update stuck in downloadingAgent offline or binary URL unreachableVerify host is online and can reach the backend.
Update stuck in health_checkingNew agent not heartbeatingCheck if the new agent process is running on the host. Review agent logs.
Publication pausedWave failure rate exceeded thresholdReview failed hosts, fix issues, then resume the publication.
Rollback failsUpdate already completedCannot rollback — the old agent is already shut down.
Software deploy returns 400No installers for versionAdd an installer for the target OS before deploying.
Software deploy skips hostsHost OS doesn't match any installerVerify installer os_type matches host os_type.
Legacy agents not updatingLegacy agents excluded from rolloutLegacy agents must be manually replaced with the Go agent.
Agent registration 422 errorStale agent ID file on hostDelete /var/lib/spog-agent/agent_id and restart the agent.
Agent heartbeat 401Wrong organization secretVerify organization_secret in agent config matches the org.