AWS Detection Engineering | Raghava Gatadi

Cloud environments generate a massive volume of logs. For a Security Operations Center (SOC), the challenge isn't finding data—it's finding signal within the noise. In this project, I utilized the Invictus IR AWS Attack Dataset to engineer a suite of custom detection rules in Splunk Enterprise, mapped to the MITRE ATT&CK Cloud Matrix.

The Detection Strategy

Most "out-of-the-box" AWS alerts trigger on single events, such as a failed login. However, sophisticated attackers like the one profiled in this dataset (bert-jan) use automated scripts that mimic administrative behavior. My strategy focused on Statistical Thresholding and Cardinality Analysis.

Custom Rules

100%

Detection Rate

MITRE

Aligned

Technical Deep Dive: Key Logic

1. API Reconnaissance Spike T1595.002

Instead of counting total API calls, this rule counts distinct API names (dc(eventName)). This filters out noisy scripts that retry the same unauthorized action and highlights an actor actively mapping the breadth of the environment.

index=invictus (eventName=List* OR eventName=Describe* OR eventName=Get*)
| eval _time=strptime(eventTime,"%Y-%m-%dT%H:%M:%SZ")
| bin _time span=10m
| stats dc(eventName) as unique_api_calls, 
        values(eventName) as attempted_calls, 
        count as total_hits 
  by _time, sourceIPAddress, userIdentity.arn
| where unique_api_calls > 10
| sort - unique_api_calls

Validation: Detected bert-jan performing 47 unique API calls from 10.8.8.10 within a 10-minute window—clear reconnaissance behavior.

2. SSM Parameter Enumeration T1555.006

AWS Systems Manager Parameter Store often holds secrets: API keys, database credentials, and environment variables. This rule targets bulk retrieval patterns that indicate credential harvesting.

index=invictus eventSource="ssm.amazonaws.com" 
       eventName IN ("DescribeParameters","GetParameters","GetParameter")
| eval event_time=strptime(eventTime,"%Y-%m-%dT%H:%M:%SZ")
| bin event_time span=10m
| stats count as api_call_count,
        dc(requestParameters.name) as unique_parameters,
        values(userAgent) AS user_agents 
  by userIdentity.arn, event_time
| where api_call_count > 10

Validation: Captured the Stratus Red Team automated tool extracting 16 unique secrets using the user agent stratus-red-team_11a6ef34...

3. Failed Privilege Escalation T1078 / T1068

Attackers often "fuzz" permissions to see what their stolen credentials can access. I implemented Dynamic Severity logic using Splunk's eval case() function to prioritize alerts based on the volume of AccessDenied errors.

index=invictus errorCode="AccessDenied" OR errorCode="Client.UnauthorizedOperation"
| eval event_time=strptime(eventTime,"%Y-%m-%dT%H:%M:%SZ")
| bin event_time span=5m
| stats count as failed_attempts, 
        values(eventName) as attempted_events,
        values(eventMessage) as error_details 
  by event_time, sourceIPAddress, userIdentity.arn
| where failed_attempts > 3
| eval severity=case(failed_attempts > 10, "CRITICAL", 
                     failed_attempts > 5, "HIGH", 
                     1=1, "MEDIUM")
| sort - failed_attempts

Validation: Flagged 15 failed DescribeInstanceAttribute attempts by stratus-red-team-get-usr-data-role—a clear EC2 credential theft pattern.

4. CloudTrail Tampering T1562.008

A critical, high-fidelity alert monitoring for attempts to disable logging. Attackers use this to "blind" security teams before performing destructive actions. This rule has zero tolerance—any match is CRITICAL.

index=invictus 
       eventName IN ("StopLogging","DeleteTrail","UpdateTrail",
                     "PutEventSelectors","DeleteDetector","ArchiveFindings")
| table eventTime, eventName, awsRegion, userIdentity.arn, 
        sourceIPAddress, userAgent, requestID

Validation: Detected bert-jan executing StopLogging at 12:01 PM, immediately followed by DeleteTrail during the cleanup phase.

5. Lateral Movement via AssumeRole T1550.001

Detects Role Chaining—where an attacker pivots from one IAM role to another to escalate privileges or hide their origin. The rule alerts on spikes in AssumeRole calls within a time window.

index=invictus eventName="AssumeRole"
| eval event_time=strptime(eventTime,"%Y-%m-%dT%H:%M:%SZ")
| bin event_time span=10m
| stats count as total_attempts,
        values(errorMessage) as error_message,
        values(errorCode) as errors
  by userIdentity.arn, event_time, sourceIPAddress
| where total_attempts > 3
| table event_time, userIdentity.arn, sourceIPAddress, 
        total_attempts, errors, error_message

Validation: Identified 17 AssumeRole attempts (4 failed) from bert-jan, indicating active privilege escalation via role chaining.

6. Mass Deletion / Destruction T1485 / T1490

The final safety net. This rule detects bulk deletion of assets (>20 events in 5 minutes) OR the deletion of specific critical infrastructure (KMS Keys, CloudTrail logs) regardless of volume.

index=invictus (eventName="Delete*" OR eventName="Terminate*" OR eventName="Drop*")
| eval event_time=strptime(eventTime,"%Y-%m-%dT%H:%M:%SZ")
| bin event_time span=5m
| stats count as deletion_count,
        values(eventName) as actions_taken 
  by userIdentity.arn, sourceIPAddress, event_time
| where deletion_count > 20 
   OR (deletion_count >= 1 AND 
       (actions_taken="DeleteTrail" OR actions_taken="DeleteKey"))

Validation: Cross-correlated with CloudTrail tampering activity, catching the final DeleteTrail event at 12:35 PM.

Threat Actor Profile: bert-jan

Understanding the adversary is critical for tuning detections. The primary actor in this simulation exhibited clear behavioral patterns:

1,037 events (83.8% of all activity)
187 unique API actions across EC2, S3, SSM, IAM, and KMS
6 distinct source IPs, with 192.168.10.20 as primary orchestration
Attack capabilities: Infrastructure manipulation, IAM privilege escalation, secrets exfiltration, and logging tampering

The use of the Stratus Red Team Framework was evident through automated role names (stratus-red-team-get-usr-data-role) and high-frequency API calls that triggered throttling exceptions—a key indicator of scripted tooling.

Incident Reconstruction (The 55-Minute Attack)

By correlating the custom alerts, I was able to reconstruct a precise timeline of the breach. This timeline demonstrates how behavioral detections catch attackers even when they attempt to hide their tracks.

11:42 AM — Attack Initiation: First AssumeRole and secrets access events detected.
11:50 AM — Reconnaissance: Attacker performed mass enumeration of VPCs, S3 buckets, and SSM parameters (Detected via Rule 1 & 2).
11:55 AM — Privilege Escalation: Automated scripts attempted to extract EC2 metadata, triggering multiple AccessDenied events (Rule 3).
12:00 PM — Lateral Movement: Attacker attempted role chaining via AssumeRole to gain higher privileges (Rule 5).
12:01 PM — Defense Evasion: Attacker executed StopLogging to disable CloudTrail and "blind" the security team (Rule 4).
12:20 PM — Data Exfiltration: Secrets retrieved via GetParameter and Decrypt; database snapshots prepared for external sharing.
12:35 PM — Impact & Cleanup: Attacker deleted the trail configuration (DeleteTrail) and attempted infrastructure destruction (Rule 6).

Key Insight: The attack event rate spiked +221% during the escalation phase (6.75 → 21.7 events/min). This velocity change alone is a powerful behavioral indicator of automated attack tooling.

MITRE ATT&CK Cloud Matrix Coverage

Each detection rule was explicitly mapped to adversary techniques to ensure comprehensive coverage across the attack lifecycle:

T1595.002 Active Scanning: Vulnerability Scanning

T1555.006 Credentials from Password Stores: Cloud Secrets

T1078 / T1068 Valid Accounts / Exploitation for Priv Esc

T1562.008 Impair Defenses: Disable Cloud Logs

T1550.001 Use Alternate Authentication Material

T1485 / T1490 Data Destruction / Inhibit System Recovery

Lessons Learned

Behavior > Signatures: Counting distinct API calls (dc()) is far more effective than raw volume for detecting reconnaissance.
Context Matters: Correlating AccessDenied errors with specific API names (DescribeInstanceAttribute) reduces false positives.
Time-Binning is Critical: Using bin span=10m enables detection of burst activity without alert fatigue.
Dynamic Severity Saves Time: Routing alerts by failure volume helps analysts prioritize true threats.
Test Against Real Data: Validating rules against the Invictus dataset confirmed 100% coverage of the simulated attack chain.

Conclusion

This project proves that effective AWS detection requires moving beyond simple signatures. By focusing on behavioral deviations (spikes in unique API calls) and intent-based errors (UnauthorizedOperation), I built a detection suite that identified the adversary at every stage of the lifecycle—from initial foot printing to final evidence destruction.

The next step? Operationalizing these rules into a production Splunk environment with automated response playbooks. Because in cloud security, detection is only half the battle—response is where breaches are stopped.