Spike in AWS Error Messages

edit

A machine learning job detected a significant spike in the rate of a particular error in the CloudTrail messages. Spikes in error messages may accompany attempts at privilege escalation, lateral movement, or discovery.

Rule type: machine_learning

Rule indices: None

Severity: low

Risk score: 21

Runs every: 15m

Searches indices from: now-60m (Date Math format, see also Additional look-back time)

Maximum alerts per execution: 100

References:

Tags:

  • Elastic
  • Cloud
  • AWS
  • ML
  • Investigation Guide

Version: 102

Rule authors:

  • Elastic

Rule license: Elastic License v2

Investigation guide

edit
## Triage and analysis

### Investigating Spike in AWS Error Messages

CloudTrail logging provides visibility on actions taken within an AWS environment. By monitoring these events and
understanding what is considered normal behavior within an organization, you can spot suspicious or malicious activity
when deviations occur.

This rule uses a machine learning job to detect a significant spike in the rate of a particular error in the CloudTrail
messages. Spikes in error messages may accompany attempts at privilege escalation, lateral movement, or discovery.

#### Possible investigation steps

- Examine the history of the error. If the error only manifested recently, it might be related to recent changes in an
automation module or script. You can find the error in the `aws.cloudtrail.error_code field` field.
- Investigate other alerts associated with the user account during the past 48 hours.
- Validate the activity is not related to planned patches, updates, or network administrator activity.
- Examine the request parameters. These may indicate the source of the program or the nature of the task being performed
when the error occurred.
    - Check whether the error is related to unsuccessful attempts to enumerate or access objects, data, or secrets.
- Considering the source IP address and geolocation of the user who issued the command:
    - Do they look normal for the calling user?
    - If the source is an EC2 IP address, is it associated with an EC2 instance in one of your accounts or is the source
    IP from an EC2 instance that's not under your control?
    - If it is an authorized EC2 instance, is the activity associated with normal behavior for the instance role or roles?
    Are there any other alerts or signs of suspicious activity involving this instance?
- Consider the time of day. If the user is a human (not a program or script), did the activity take place during a normal
time of day?
- Contact the account owner and confirm whether they are aware of this activity if suspicious.
- If you suspect the account has been compromised, scope potentially compromised assets by tracking servers, services,
and data accessed by the account in the last 24 hours.

### False positive analysis

- Examine the history of the command. If the command only manifested recently, it might be part of a new automation
module or script. If it has a consistent cadence (for example, it appears in small numbers on a weekly or monthly cadence),
it might be part of a housekeeping or maintenance process. You can find the command in the `event.action field` field.
- The adoption of new services or the addition of new functionality to scripts may generate false positives.

### Related Rules

- Unusual City For an AWS Command - 809b70d3-e2c3-455e-af1b-2626a5a1a276
- Unusual Country For an AWS Command - dca28dee-c999-400f-b640-50a081cc0fd1
- Unusual AWS Command for a User - ac706eae-d5ec-4b14-b4fd-e8ba8086f0e1
- Rare AWS Error Code - 19de8096-e2b0-4bd8-80c9-34a820813fff

### Response and remediation

- Initiate the incident response process based on the outcome of the triage.
- Disable or limit the account during the investigation and response.
- Identify the possible impact of the incident and prioritize accordingly; the following actions can help you gain context:
    - Identify the account role in the cloud environment.
    - Assess the criticality of affected services and servers.
    - Work with your IT team to identify and minimize the impact on users.
    - Identify if the attacker is moving laterally and compromising other accounts, servers, or services.
    - Identify any regulatory or legal ramifications related to this activity.
- Investigate credential exposure on systems compromised or used by the attacker to ensure all compromised accounts are
identified. Reset passwords or delete API keys as needed to revoke the attacker's access to the environment. Work with
your IT teams to minimize the impact on business operations during these actions.
- Check if unauthorized new users were created, remove unauthorized new accounts, and request password resets for other IAM users.
- Consider enabling multi-factor authentication for users.
- Review the permissions assigned to the implicated user to ensure that the least privilege principle is being followed.
- Implement security best practices [outlined](https://aws.amazon.com/premiumsupport/knowledge-center/security-best-practices/) by AWS.
- Take the actions needed to return affected systems, data, or services to their normal operational levels.
- Identify the initial vector abused by the attacker and take action to prevent reinfection via the same vector.
- Using the incident response data, update logging and audit policies to improve the mean time to detect (MTTD) and the
mean time to respond (MTTR).