ES|QL for Security Analysts

Getting Started

Elasticsearch | Query Language

Dynamic language designed from the ground up to transform, enrich, and simplify investigations
- faster results
- simplified user experience
- new search capabilities
- quicker insights
- accurate alerting
Uses a dedicated query engine
Brings together the capabilities of multiple languages

Why is it needed?

Purpose for creation
- flexible searches with the ability to define fields at query time
- accurate detection rules that help reduce alert fatigue
- work with summarized data using aggregations in queries
- providing data enrichment at query time

Requirements

Elastic Stack version 8.14
Data

ES|QL Syntax

Each command works on the output of the previous one using the pipe character

source-command
| processing-command1
| processing-command2

Can be written in multiple lines or one

source-command | processing-command1 | processing-command2

Example

FROM logs-network*
| KEEP @timestamp, source.ip, destination.ip, source.bytes, destination.bytes
| EVAL total.bytes = source.bytes + destination.bytes
| WHERE total.bytes > 10000
| SORT source.ip, total.bytes desc
| LIMIT 2

ES|QL Basic Commands

FROM

… retrieves data from ES

Can specify Indices, Data Streams, and Aliases
Wildcards and comma-separated lists can be used to query multiple sources
Returns 1000 results by default

KEEP

… specifies which columns are returned and in which order

Results will be printed in a table format
The order of field in the query will dictate the column order

EVAL

… enables you to calculate new values and add them as a new column

This new column is only created in the output table, it is not stored in ES
EVAL typically uses functions to calculate values

WHERE

… returns a table with the rows where the specified condition is true

Key command to filter data
The condition can include operators (<, ==, and, or, …) and functions

SORT

… orders the rows of the table

asc/desc define the order -> ascending is used if not specified
Multiple columns can be specified

LIMIT

… sets how many rows are returned by the query

A limit of 10.000 rows still applies
Useful for calculating the top results in a search

Operators

Query building-blocks for working with data

Relational operators (<, >, <=, >=, ==, !=)
Mathematical operators (+, -, *, /)
Logical operators (AND, OR, NOT)
NULL value predicates (IS NULL, IS NOT NULL)
Other comparison operators (LIKE, RLIKE, IN)

Relation Operators

Return a boolean result
Comparisons can be made to values and to other fields of the same type
If either field being compared is multi-value the result will be NULL
Only certain fields are supported (date, numbers, text, keywords, IP)

Mathematical Operators

EVAL’s best friend
If either field is multi-value the result will be NULL
Only certain fields are supported (date, numbers, text, keywords, IP)

Logical Operators

Serve them with a side of parantheses for best results

NULL Values

When ingesting data, empty fields will be set to NULL
By default NULL values are larger than other values
NULL values are ignored when included in calculations
Great for finding fields that contain data

LIKE

Matches a string against a pattern using wildcards
- * matches zero or more matches
- ? matches one character

RLIKE

Matches strings using RegEx
More versatile than the LIKE operator but more compute intensive
Useful for matching known complex patterns

IN

Matches values in a comma-separated list of literals, fields, or expressions
Useful when filtering for multiple values

Comments

Helpful for testing parts of a query, debugging, and documenting

// for single line comments
/* and */ for block comments

Metadata Fields

ES|QL can access these metadata fields by using the METADATA directive in the FROM command:
- _index: Index in which the document is stored
- _id: Unique identifier for the document
- _version: Version of the document

Functions

Functions that help work with data:

Working with strings
IP address CIDR notation search
Grouping Functions
Aggregation Functions

IP Functions - CIDR_MATCH

Used for IPv4/IPv6 CIDR matching
Returns “true” if the IP is contained in the specified CIDR blocks

String Functions - to/from base64

Encoding is useful to obfuscate URL paths, process arguments, …
Further commands can be appended afterwards
Query will crash if no base64 values are ingested

String Functions - TO_LOWER/TO_UPPER

These commands will convert input text to either uppercase or lowercase
Useful in case insensivity situations

Aggregate Functions - STATS […] BY

“STATS … BY” will group rows into buckets based on a common value you specify
Can be used to calculate one or more aggregated values over grouped rows
If “BY” is omitted no grouping will happen and only one calculation over all data will occur
Multiple aggregation functions are supported (Avg, Count, Sum, Percentile, …)

Grouping Functions - Bucket

Creates groups of values out of a numeric or date range
Useful to find anomalies in data over time or numeric data

Bucket can be used in 2-parameter mode

Useful for when you don’t want to specify the interval:
- Second parameter is the size of each bucket
- It must be a double for numbers or a time period for dates

Investigating Events

ES|QL within Timeline

Timeline is the investigative pane in Security
Use ES|QL to find specific events
Cannot pin events like in query
Staging area for rule creation

Building ES|QL Security Queries

Understand what data is available
- Which integrations do you have?
- What fields are available?
Define the goals and scope of the query
- Similar to hypothesis-driven threat hunting
- What fields and ES|QL functions are necessary?
Develop the query:
- gather the necessary information
- aggregations can provide unexpected results, test them separately
- remember you can comment out
- leave LIMIT and data conversions until the end if possible

Rule Creation with ES|QL

Rule Creation Workflow

Create ES|QL queries in Timeline or Discover first
Create ES|QL rules in second
Considerations:
- Is this query applicable over time?
- Do I want this query running on a schedule?
- Will this query result in false positives? How many?
Use your “strongest” ES|QL queries

Aggregating vs Non-Aggregating Rules

Aggregating rules
- Use STATS… BY functions
- Create a new field
- Performs some type of mathematical operation
Non-Aggregating rules
- Non-aggregating queries retrieve specific records without performing calculations on grouped data
- Necessary for de-duplication

Rule Creation

You can optionally assign a Timeline template to aide your analysis when this alert triggers
Identifying information on the rule goes in the “About rule” section
Set the schedule that your rule will run
- a rule that runs every 5 minutes with 1 additional look-back minute will look at the last 6 minutes of data
Alerts can also be sent via connectors

Considerations When Writing Rules

The LIMIT command specifies the number of rows that can be returned
A detection rule’s max_signals setting specifies the maximum number of alerts it can create every time it runs
- The max_signals default value is 100
If the LIMIT value is lower than the max_signals value, the rule uses the LIMIT value to determine the maximum number of alerts the rule generates
- If the LIMIT value is higher than the max_signals value, the rule uses the max_signals value

Keyboard shortcuts

Cybersecurity Notes