IT Service & Operations Manual

Host & Agent Management

Enrollment, host lifecycle, agent health, and the workflows teams rely on to keep managed endpoints trustworthy.

Audience: Endpoint operations teamsFocus: Managed host lifecycleStatus: Public manual

Scope

If enrollment and host ownership are fuzzy, every downstream operation becomes harder to trust. This page keeps the operator-facing lifecycle model and strips private API or setup references.

Single Source of Truth — Cadres Host & Agent Operator Manual Covers: agent deployment, host viewing, host groups, remote operations, tags, key rotation, agent configuration

Deploying the Agent

Prerequisites

Organization created with a secret
Go agent binary for the target OS/architecture

Get the Organization Secret

Each organization has a unique secret used for agent authentication. Find it in the organization settings or via: The secret field is the value the agent needs.

What Happens During Registration

The agent will: 3. Begin sending heartbeats every 60 seconds 4. Start collecting and reporting host information

Verifying Agent Health

After installation, verify the agent is communicating:

Viewing Hosts

List all hosts:

Get host details:

Returns comprehensive host information including: - System info (OS, kernel, architecture) - Hardware (CPU, RAM, manufacturer, serial) - Network interfaces and IPs - Services and their status - Local users and groups - Drives and storage - Security status (firewall, antivirus) - Installed patches and software

Host details now open on the Metrics tab by default so current health status is immediately visible. The Refresh Health action is shown only to users with hosts.manage.

a. Editing Host Metadata

You can update a host’s description, notes, and location assignment:

Only provided fields are updated (partial update). Omitted fields are left unchanged.

Updating location: The location must belong to the same organization as the host.

Requires hosts.manage permission.

Managing Host Groups

Host groups organize hosts for batch operations, monitoring, and RBAC scoping.

Create a host group:

Add hosts to group:

Remove hosts from group:

The dependency graph is mixed-mode: manual edges can link host-to-host, host-to-group, or group-to-group records. The topology view also surfaces standalone hosts that only appear through dependencies.

When creating a dependency, select “Host” or “Group” for source and target types. The selector dropdown switches between actual hosts and host groups accordingly. Use the search box above each dropdown to filter by name in large environments. The backend validates that all IDs reference the correct entity type before saving.

Remote Host Operations

All remote operations are dispatched as agent jobs. The backend queues the job, and the agent picks it up on its next poll.

Service control:

Process control: Actions: kill, force_kill

Network diagnostics:

Diagnostics polling automatically retries transient status-fetch failures. If polling still cannot recover, the page shows an inline error with a Retry Polling action.

User management:

Requires hosts.manage_users permission.

a. Remote Access Sessions

The platform provides three remote access channels: terminal, file browser, and desktop. All use WebSocket connections with first-message JWT authentication (tokens are never passed in URL query parameters).

When the UI and API are hosted on different origins, clients must open these WebSockets against the API origin, not the current page origin.

PAM checkouts can launch into these same Host Details and Remote Desktop flows. That handoff keeps the checkout token in in-memory navigation state rather than the URL, and the normal remote-access readiness contract still decides whether the session can start.

The Host Details page now uses the backend-provided remote_access_status contract to decide whether Terminal, Files, and Desktop are available. A host being online is not enough by itself. The UI shows explicit blocked reasons when permission, feature flags, host feature overrides, tunnel connectivity, or agent capability declaration prevent a channel from starting.

Terminal

Credential modes: - agent_user (default): Run as the agent’s system user - su: Switch to a specified user (credentials message sent after auth) - pam_checkout: Use a checked-out PAM credential (include pam_session_token in auth message). The checked-out username/password is used to launch the shell under that identity on the agent, not as the agent service user.

The browser does not treat the terminal as connected on WebSocket open alone. It waits for explicit session readiness from the backend/agent path before showing the terminal as live. If startup, credential switch, timeout, or tunnel teardown fails, the operator-facing error identifies the real failing boundary instead of collapsing to a generic disconnect.

Sessions auto-close after 8 hours (max duration) or 30 minutes of inactivity.

File Browser

Capabilities: Directory listing, file read/write, upload/download, create/delete/rename. Blocks access to sensitive files (the relevant workflow, private keys).

Sessions auto-close after 4 hours (max duration) or 15 minutes of inactivity.

Desktop

The desktop path does not silently downgrade a requested rdp session to console, and it does not cosmetically rewrite desktop auth modes to another name. Unsupported or runtime-blocked requests fail closed with the agent-advertised reason surfaced to the operator. The browser also keeps the session in connecting state until the agent confirms desktop_ready. If the agent reports a different desktop mode than the one requested, the UI treats that as a fatal contract violation and immediately closes the session instead of continuing under the wrong mode.

Linux desktop console can also report non-fatal runtime warnings. When the readiness contract includes input_available = false, capture is available but keyboard or mouse injection is not; the Host Details flow warns before connect and the resulting session is view-only. When X11/display access itself is missing, desktop stays blocked with an explicit runtime-prerequisite message instead of falling back or pretending the session can start.

Linux rdp / New Session is now conditionally shipped for prepared Tier 1 hosts. The mode stays fail-closed unless org/account policy enables Linux multi-session prep, the host has completed the explicit prep/install workflow, and the agent proves both XRDP session primitives and helper launch readiness. When one of those prerequisites is missing, the platform surfaces explicit blockers: - linux_multisession_toggle_disabled — org/account policy has not enabled Linux multi-session prep. - linux_multisession_prep_not_installed — required prep/install workflow has not completed on the host. - linux_multisession_helper_or_session_unavailable — XRDP helper/session primitives are not ready.

Sessions auto-close after 8 hours (max duration) or 30 minutes of inactivity.

Session Recordings

Terminal sessions produce asciicast v2 recordings. Desktop sessions produce binary .cadresdr recordings. Both are accessible via:

Host Tags

Tags are key-value pairs for custom classification.

Add a tag:

Search by tag:

Understanding Host Health

The system provides several health indicators at different levels. Here is what each one means and when to pay attention.

Status (Connectivity)

The status field on each host tells you whether the agent is currently communicating:

Value	What It Means	Action Needed?
`online`	Agent sent a heartbeat within the last 5 minutes	No — operating normally
`offline`	No heartbeat for more than 5 minutes	Yes — check network connectivity, agent process, or host availability
`maintenance`	Host is in a scheduled maintenance window	No — expected downtime
`warning`	Agent reported a warning condition	Investigate — the agent flagged something unusual
`decommissioned`	Host removed from active management	No — intentionally retired

Health Status (Heartbeat Freshness)

The health_status field gives a finer-grained view of heartbeat freshness:

Value	What It Means	When to Worry
`healthy`	Heartbeat received within 5 minutes	Not at all
`unhealthy`	No heartbeat for 5 minutes to 7 days	Moderate — host may be down or disconnected
`stale`	No heartbeat for over 7 days	High — host is likely decommissioned or permanently unreachable
`unknown`	Agent has never sent a heartbeat	Check if the agent installed and started correctly

Health Score and Tier (Composite Assessment)

Each host has a composite health_score (0–100) that combines five signals:

Connectivity (25%): How recently the host heartbeated
Disk health (20%): Free disk space across all drives
Patch compliance (20%): Whether approved patches have been installed
Service health (15%): Whether auto-start services are running
Alert penalty (20%): Active alerts reduce the score (critical alerts deduct more than medium)

The score maps to a health_tier: - healthy (80–100): Host is in good shape - degraded (60–79): One or more signals need attention - critical (0–59): Multiple problems detected — investigate immediately

Maintenance window awareness (G-04): Hosts that are currently in an active maintenance window will not be penalized for having a stale heartbeat. The connectivity factor is automatically suppressed to 100 during planned maintenance, so a host that is intentionally offline for patching or updates will not show a degraded health score. This is auto-detected via core/maintenance_utils.host_is_in_maintenance_window(). For batch scoring (fleet-level), a single query via get_hosts_in_maintenance_window() checks all hosts at once to avoid N+1 queries.

Fleet Health Percentage

What percentage of agents are online (40% weight)
How many critical/warning alerts are active (30% weight)
How many drift events are open (20% weight)
How many patch deployments have failed (10% weight)

A fleet health below 80% warrants investigation across your managed hosts.

Bulk Operations

The Hosts page supports multi-select for batch operations across multiple hosts. Select hosts using the checkboxes, then choose an action:

Available Bulk Actions:

Run Script: Execute a saved script on all selected hosts
Run Command: Execute an ad-hoc command on all selected hosts
Service Control: Start, stop, restart, enable, or disable a service on selected hosts
Install Software: Install a package on selected hosts
Assign Group: Add selected hosts to a host group
Assign Fingerprint Policy: Apply a fingerprint baseline policy
Delete Selected: Permanently remove selected hosts (requires hosts.manage permission)

Remote Access Readiness

Operator-visible blocked states include:

permission denied
feature disabled for the organization or host
tunnel disconnected
capability manifest missing
unsupported channel, mode, or credential mode
runtime prerequisite missing
Linux multi-session toggle/prep/session blockers:
linux_multisession_toggle_disabled
linux_multisession_prep_not_installed
linux_multisession_unsupported_distro_package_manager
linux_multisession_helper_or_session_unavailable

When the agent can provide a concrete runtime blocker, the UI now shows that detail directly instead of a generic connection error.

Desktop availability is now driven by the agent-advertised mode/auth matrix inside remote_access_status, not just by backend OS inference. If the agent says a desktop mode or credential mode is unsupported, the Host Details page and the backend both fail closed on that exact path.

Linux desktop console requires an accessible X11 display. The agent could not find /tmp/.X11-unix/X0 on this host.

If the display is present but input helpers are unavailable, desktop may remain available in view-only mode. In that case the connection modal warns before connect and the desktop session itself repeats the runtime warning so the operator knows capture works but input injection does not.

XRDP teardown is best-effort. The agent stops the desktop helper and verifies whether the XRDP session still exists, but upstream xrdp-sesadmin kill:sid remains unimplemented, so lingering XRDP sessions should be treated as an operator-visible runtime follow-up rather than a guaranteed automatic cleanup.

If you see that message, use terminal or file browser instead, or start/restore the host’s graphical session before retrying desktop.

Agent Key Rotation

If an agent’s Ed25519 signing key needs to be rotated (compromise, periodic rotation):

This updates the stored public key. The agent must already be using the new key for subsequent requests.

Plan Limit — Agent Registration

Contact the account admin to upgrade the subscription plan
Or decommission unused hosts to free up capacity

See saas-portal.md for full plan limit details.

PAM Enrollment from Host Detail

Discovered local user accounts can be enrolled into the PAM vault directly from the host detail view:

This creates a PAM identity for the local account and links it to the specified vault (identity group). Requires both hosts.manage and pam_vaults.manage permissions.

Software Reconciliation

View software reconciliation analysis for a host (compare installed software against the software library):

Returns authorized, unauthorized, and untracked software on the host. Requires software.view permission.

On-Demand Compliance Check

Trigger a compliance check for a specific host without waiting for the scheduled scan:

Requires compliance.manage permission. Returns the scan results immediately.

OOB Auto-Detection

When an agent reports host information, the system automatically detects out-of-band management interfaces based on the hardware manufacturer: - Dell servers get iDRAC assigned - HP/HPE servers get iLO assigned - Supermicro servers get assigned - Lenovo servers get XCC assigned - Cisco servers get assigned

Cross-References

Topic	Document
Getting started	getting-started.md
Organization management	organization-management.md
Roles & permissions	roles-permissions.md
Troubleshooting	troubleshooting-core.md
Agent migration (legacy to Go)	agent-migration.md
Host & agent architecture	`docs/architecture/host-agent-management.md`
Host & agent functional specs	`docs/functional/host-agent-management.md`