ES|QL for Security Analysts
Getting Started
Elasticsearch | Query Language
- Dynamic language designed from the ground up to transform, enrich, and simplify investigations
- faster results
- simplified user experience
- new search capabilities
- quicker insights
- accurate alerting
- Uses a dedicated query engine
- Brings together the capabilities of multiple languages
Why is it needed?
- Purpose for creation
- flexible searches with the ability to define fields at query time
- accurate detection rules that help reduce alert fatigue
- work with summarized data using aggregations in queries
- providing data enrichment at query time
Requirements
- Elastic Stack version 8.14
- Data
ES|QL Syntax
- Each command works on the output of the previous one using the pipe character
source-command
| processing-command1
| processing-command2
- Can be written in multiple lines or one
source-command | processing-command1 | processing-command2
Example
FROM logs-network*
| KEEP @timestamp, source.ip, destination.ip, source.bytes, destination.bytes
| EVAL total.bytes = source.bytes + destination.bytes
| WHERE total.bytes > 10000
| SORT source.ip, total.bytes desc
| LIMIT 2
ES|QL Basic Commands
FROM
… retrieves data from ES
- Can specify Indices, Data Streams, and Aliases
- Wildcards and comma-separated lists can be used to query multiple sources
- Returns 1000 results by default
KEEP
… specifies which columns are returned and in which order
- Results will be printed in a table format
- The order of field in the query will dictate the column order
EVAL
… enables you to calculate new values and add them as a new column
- This new column is only created in the output table, it is not stored in ES
- EVAL typically uses functions to calculate values
WHERE
… returns a table with the rows where the specified condition is true
- Key command to filter data
- The condition can include operators (<, ==, and, or, …) and functions
SORT
… orders the rows of the table
asc/descdefine the order -> ascending is used if not specified- Multiple columns can be specified
LIMIT
… sets how many rows are returned by the query
- A limit of 10.000 rows still applies
- Useful for calculating the top results in a search
Operators
Query building-blocks for working with data
- Relational operators (<, >, <=, >=, ==, !=)
- Mathematical operators (+, -, *, /)
- Logical operators (AND, OR, NOT)
- NULL value predicates (IS NULL, IS NOT NULL)
- Other comparison operators (LIKE, RLIKE, IN)
Relation Operators
- Return a boolean result
- Comparisons can be made to values and to other fields of the same type
- If either field being compared is multi-value the result will be NULL
- Only certain fields are supported (date, numbers, text, keywords, IP)
Mathematical Operators
- EVAL’s best friend
- If either field is multi-value the result will be NULL
- Only certain fields are supported (date, numbers, text, keywords, IP)
Logical Operators
- Serve them with a side of parantheses for best results
NULL Values
- When ingesting data, empty fields will be set to NULL
- By default NULL values are larger than other values
- NULL values are ignored when included in calculations
- Great for finding fields that contain data
LIKE
- Matches a string against a pattern using wildcards
*matches zero or more matches?matches one character
RLIKE
- Matches strings using RegEx
- More versatile than the LIKE operator but more compute intensive
- Useful for matching known complex patterns
IN
- Matches values in a comma-separated list of literals, fields, or expressions
- Useful when filtering for multiple values
Comments
- Helpful for testing parts of a query, debugging, and documenting
// for single line comments
/* and */ for block comments
Metadata Fields
- ES|QL can access these metadata fields by using the METADATA directive in the FROM command:
- _index: Index in which the document is stored
- _id: Unique identifier for the document
- _version: Version of the document
Functions
Functions that help work with data:
- Working with strings
- IP address CIDR notation search
- Grouping Functions
- Aggregation Functions
IP Functions - CIDR_MATCH
- Used for IPv4/IPv6 CIDR matching
- Returns “true” if the IP is contained in the specified CIDR blocks
String Functions - to/from base64
- Encoding is useful to obfuscate URL paths, process arguments, …
- Further commands can be appended afterwards
- Query will crash if no base64 values are ingested
String Functions - TO_LOWER/TO_UPPER
- These commands will convert input text to either uppercase or lowercase
- Useful in case insensivity situations
Aggregate Functions - STATS […] BY
- “STATS … BY” will group rows into buckets based on a common value you specify
- Can be used to calculate one or more aggregated values over grouped rows
- If “BY” is omitted no grouping will happen and only one calculation over all data will occur
- Multiple aggregation functions are supported (Avg, Count, Sum, Percentile, …)
Grouping Functions - Bucket
- Creates groups of values out of a numeric or date range
- Useful to find anomalies in data over time or numeric data
Bucket can be used in 2-parameter mode
- Useful for when you don’t want to specify the interval:
- Second parameter is the size of each bucket
- It must be a double for numbers or a time period for dates
Investigating Events
ES|QL within Timeline
- Timeline is the investigative pane in Security
- Use ES|QL to find specific events
- Cannot pin events like in query
- Staging area for rule creation
Building ES|QL Security Queries
- Understand what data is available
- Which integrations do you have?
- What fields are available?
- Define the goals and scope of the query
- Similar to hypothesis-driven threat hunting
- What fields and ES|QL functions are necessary?
- Develop the query:
- gather the necessary information
- aggregations can provide unexpected results, test them separately
- remember you can comment out
- leave LIMIT and data conversions until the end if possible
Rule Creation with ES|QL
Rule Creation Workflow
- Create ES|QL queries in Timeline or Discover first
- Create ES|QL rules in second
- Considerations:
- Is this query applicable over time?
- Do I want this query running on a schedule?
- Will this query result in false positives? How many?
- Use your “strongest” ES|QL queries
Aggregating vs Non-Aggregating Rules
- Aggregating rules
- Use STATS… BY functions
- Create a new field
- Performs some type of mathematical operation
- Non-Aggregating rules
- Non-aggregating queries retrieve specific records without performing calculations on grouped data
- Necessary for de-duplication
Rule Creation
- You can optionally assign a Timeline template to aide your analysis when this alert triggers
- Identifying information on the rule goes in the “About rule” section
- Set the schedule that your rule will run
- a rule that runs every 5 minutes with 1 additional look-back minute will look at the last 6 minutes of data
- Alerts can also be sent via connectors
Considerations When Writing Rules
- The LIMIT command specifies the number of rows that can be returned
- A detection rule’s max_signals setting specifies the maximum number of alerts it can create every time it runs
- The max_signals default value is 100
- If the LIMIT value is lower than the max_signals value, the rule uses the LIMIT value to determine the maximum number of alerts the rule generates
- If the LIMIT value is higher than the max_signals value, the rule uses the max_signals value