Security & Forecasting

Predict failures, detect anomalies, measure operational efficiency

Hardware failure forecasting, resource right-sizing, process baseline enforcement with LOTL detection, network exfiltration detection, and operational efficiency metrics in one security surface.

Technical Manual
Status: Available

Prerequisites

  • User role with security.view permission for viewing forecasts, recommendations, baselines, deviations, exfiltration alerts, efficiency stats, and the security dashboard
  • User role with security.manage permission for acknowledging/resolving forecasts, accepting/dismissing recommendations, managing baselines, whitelisting processes, investigating/resolving exfiltration alerts, and triggering manual engine runs
  • Hosts must be online and reporting metrics (S.M.A.R.T., thermal, CPU, memory, disk) for forecasting and resource analysis
  • Hosts must be reporting process lists for process baseline learning and enforcement
  • Network flow collection must be enabled on agents for exfiltration detection

Hardware forecasting

The hardware forecasting engine analyzes S.M.A.R.T., thermal, and memory metrics over a 14-day window to predict hardware failures before they happen.

Viewing forecasts

  1. Navigate to Security > Hardware Forecasts.
  2. Filter by severity, status, component (disk, CPU, memory), host, or organization.
  3. Click a forecast to see details: failure probability, predicted failure date, trend direction, confidence, evidence metrics, and recommendation.

Forecast fields

ComponentHardware component type: disk, CPU, or memory.
SeveritySeverity based on failure probability thresholds. Info forecasts (less than 0.1 probability) are filtered out.
Failure ProbabilityEstimated probability of failure (0.0 to 1.0).
Predicted Failure DateExtrapolated date when failure is expected, based on trend analysis.
Evidence MetricsThe raw metric data points that drove the prediction.
RecommendationSuggested action (e.g., "Replace disk", "Clean fan assembly", "Add memory").

Forecast lifecycle

StatusDescriptionAction
activeForecast is current and requires attention.Acknowledge or resolve.
acknowledgedOperator has seen and is tracking the issue.Resolve when addressed.
resolvedIssue has been addressed (e.g., disk replaced). Terminal state.

Running the engine manually

  1. Click Run Analysis to trigger a manual forecast run.
  2. The engine analyzes all hosts with relevant metrics from the last 14 days.
  3. Results appear as new or updated forecasts.

The engine also runs automatically daily at 02:00 UTC.

Resource recommendations

The resource analysis engine evaluates CPU, memory, and disk utilization data over a 30-day window to generate right-sizing recommendations.

Viewing recommendations

  1. Navigate to Security > Resource Recommendations.
  2. Filter by resource type (CPU, Memory, Disk), recommendation (Downsize, Upsize, Optimal), status, host, or organization.
  3. Click a recommendation to see details.

Recommendation fields

Resource TypeResource being analyzed: CPU, Memory, or Disk.
RecommendationAction: Downsize (over-provisioned), Upsize (under-provisioned), or Optimal (correctly sized).
Current AllocationCurrent resource allocation value.
Utilization StatsAverage, peak, and p95 utilization percentages over the analysis window.
Recommended ValueSuggested allocation based on utilization patterns.
Estimated SavingsEstimated cost savings if the recommendation is implemented.
ReasoningText explanation of why this recommendation was generated.

Actions

  • Accept: Acknowledge the recommendation for action. Click the Accept button on the recommendation detail view.
  • Dismiss: Hide the recommendation. It will NOT be regenerated until the dismissed record is removed. Click the Dismiss button on the recommendation detail view.

The resource analysis engine runs automatically daily at 03:00 UTC, or manually by clicking Run Analysis on the Resource Recommendations page.

Minimum data requirement. At least 100 data points are needed per resource type (approximately 4 days at 5-minute intervals). Hosts with insufficient data show "insufficient_data" status.

Process baselines

Process baselines learn the normal set of running processes on a host and flag anomalies during enforcement.

Creating a baseline

  1. Navigate to Security > Process Baselines.
  2. Click Create Baseline.
  3. Select a host and organization.
  4. Configure detection options:
    Alert on New ProcessesEnable to flag any process not in the learned whitelist (default: on).
    Alert on LOTL BinariesEnable to flag known Living off the Land binaries (default: on).
  5. Click Save.
  6. The baseline starts in learning mode -- it absorbs processes from the next 10 host-info submissions.
  7. After 10 samples, it automatically switches to enforcement mode.

Learning mode vs enforcement mode

ModeBehaviorDuration
LearningMerges incoming processes by (name, path) into the whitelist. No deviations generated.10 host-info submissions with process data.
EnforcementCompares current processes against whitelist. Unknown processes generate deviations.Indefinite until reset or deleted.

Managing baselines

  • Update settings: Toggle alert flags, active status, or manually switch learning/enforcement mode.
  • Reset: Clear all learned processes and restart learning mode from scratch. Click Reset on the baseline detail view.
  • Delete: Remove the baseline and all associated deviations.

LOTL binary detection

Living off the Land (LOTL) detection flags when known system utilities are used in potentially suspicious ways. The detection catalog covers 20 specific binaries commonly abused by attackers.

How it works

  • During enforcement mode, each process not in the whitelist is checked against the LOTL catalog for the host's OS.
  • If "Alert on LOTL Binaries" is enabled and the process name matches a catalog entry, a deviation is created with type "LOTL Binary" and the LOTL category from the catalog.
  • If the process does not match the LOTL catalog and "Alert on New Processes" is enabled, it creates a deviation with type "New Process" and severity Medium.

Responding to deviations

  1. Navigate to Security > Process Deviations.
  2. Filter by severity, status, deviation type (New Process or LOTL Binary), host, or baseline.
  3. Review each deviation: process name, path, user, PID, command line, LOTL category (if applicable).
  4. Choose an action:
    • Acknowledge: Mark as seen.
    • Whitelist: Add the process to the baseline whitelist so it will not be flagged again. Updates the baseline's process list.
    • False Positive: Mark as a detection error.
    • Resolve: Mark as handled.

Network exfiltration detection

The exfiltration detector runs every 30 minutes, analyzing network flow data to identify potential data exfiltration. It uses four detection strategies:

Alert typeTrigger conditionDefault severity
Large OutboundMore than 100 MB transferred to a single destination in 30 minutes.High
Unknown DestinationMore than 50 connections to an unresolved external IP (non-RFC1918).Medium
Tunneling PortUnexpected process using ports 22, 53, 443, 8080, or 8443.Medium
Receive-Only OutboundServer-type host (typically receive-only) suddenly sending outbound traffic.Critical

Investigating alerts

  1. Navigate to Security > Exfiltration Alerts.
  2. Filter by alert type, severity, status, host, or organization.
  3. Review each alert: destination IP/port, bytes transferred, connection count, process name, time window.
  4. Investigate: Set status to "investigating" (only from "open" status).
  5. Resolve: Close as "confirmed" (real threat) or "false_positive".
Deduplication. Each strategy checks for existing alerts within a 60-minute window to avoid duplicate alerts for the same ongoing activity.

Exfiltration alert triage

Follow this process for each exfiltration alert:

  1. Review the alert details: Check the destination IP, port, process name, and bytes transferred.
  2. Set to investigating: Click Investigate to claim the alert.
  3. Correlate with context: Check if the destination is a known service (backup target, CDN, cloud provider). Check if the process is expected (backup agent, update service).
  4. Resolve: If legitimate traffic, resolve as "False Positive". If a real threat, resolve as "Confirmed" and take appropriate incident response action.
Cannot investigate non-open alerts. The "investigate" action is only available from the "open" status. Alerts that are already investigating or resolved cannot be set back to investigating.

Efficiency metrics

Efficiency metrics measure how quickly and effectively your team responds to alerts and remediations.

  1. Navigate to Security > Efficiency Stats.
  2. Set the period (default 30 days, max 365).
MTTAMean Time to Acknowledge: average minutes from alert trigger to acknowledgment.
MTTRMean Time to Resolve: average minutes from alert trigger to resolution.
Auto-Resolution RatePercentage of alerts resolved automatically (metric returned to normal without human intervention).
Toil Hours SavedEstimated hours saved by auto-resolution (30 minutes per auto-resolved alert).
Remediation Success RatePercentage of completed remediation executions with outcome "fixed".
Null values. MTTA and MTTR return null if no alerts were acknowledged or resolved in the selected period. Extend the time range to get data.

Security dashboard

The security dashboard aggregates all security posture data into a single view.

  1. Navigate to Security > Dashboard.

Dashboard panels

Forecast summaryActive hardware forecasts grouped by severity (critical, high, medium, low).
Recommendation summaryActive resource recommendations grouped by type (downsize, upsize, optimal).
Baseline summaryTotal baselines, count in learning vs. enforcing, total open deviations.
Exfiltration summaryOpen and investigating exfiltration alerts grouped by alert type.

Permissions reference

ActionPermission
View forecasts, recommendations, baselines, deviations, exfiltration alertssecurity.view
View efficiency stats and security dashboardsecurity.view
Acknowledge/resolve forecastssecurity.manage
Accept/dismiss recommendationssecurity.manage
Create/update/delete/reset baselinessecurity.manage
Acknowledge/whitelist/false-positive/resolve deviationssecurity.manage
Investigate/resolve exfiltration alertssecurity.manage
Trigger forecasting/resource analysis engines manuallysecurity.manage

Navigation reference

FeatureLocation
Hardware ForecastsSecurity > Hardware Forecasts
Resource RecommendationsSecurity > Resource Recommendations
Process Baselines & DeviationsSecurity > Process Baselines / Process Deviations
Exfiltration AlertsSecurity > Exfiltration Alerts
Efficiency MetricsSecurity > Efficiency Stats
Security DashboardSecurity > Dashboard

Troubleshooting

SymptomCauseFix
No hardware forecasts generatedHosts not reporting S.M.A.R.T., thermal, or memory metricsVerify agents are collecting hardware health, thermal, and memory metrics.
Forecasts all show "info" severityMetrics within normal thresholdsExpected behavior. Info forecasts with less than 0.1 probability are filtered out.
Forecasts not updatingEngine not running or no recent metricsCheck system health for the hardware forecasting heartbeat. Engine runs at 02:00 UTC daily. Trigger manually from the Forecasts page.
Recommendations show "insufficient_data"Fewer than 100 data points in 30 daysAgent needs to report metrics for at least ~4 days at 5-min intervals.
Dismissed recommendation reappearsShould not happenDismissed records are skipped by the upsert engine. Check if someone deleted and re-created it.
Baseline stuck in learning modeFewer than 10 host-info submissions with process dataHost must send process data 10 times. Or manually switch to enforcement mode from the baseline detail view.
Baseline not detecting new processes"Alert on New Processes" is disabled, or process already whitelistedCheck baseline settings. Check if the process was previously whitelisted.
LOTL binary not flagged"Alert on LOTL Binaries" is disabled, or binary not in catalogVerify the setting. The catalog covers 20 specific binaries. Others are treated as new process deviations.
No exfiltration alertsAgents not collecting flow dataVerify agent flow collection is enabled (connection sampling must be turned on).
Exfiltration alert for legitimate trafficFalse positiveResolve as "false_positive". Thresholds are currently hardcoded.
Efficiency stats show null for MTTA/MTTRNo acknowledged/resolved alerts in periodExtend the time period to capture more data.
Security dashboard shows 0 everywhereNo active forecasts/recommendations/baselines/alertsRun engines manually and verify hosts have metric data.