Search
Full Text Queries
Query DSL Overview
- A search language for Elasticsearch
- query
- aggregate
- sort
- filter
- manipulate responses
GET blogs/_search
{
"query": {
"match": {
"title": "community team"
}
}
}
match Query
- Returns documents that match a provided text, number, date, or boolean value
- By default, the match query
- uses OR logic if multiple terms appear in the search query
- is case-insensitive
Request:
GET blogs/_search
{
"query": {
"match": {
"title":
"community team"
}
}
}
Response:
"title" : "Meet the team behind the Elastic Community Conference"
"title" : "Introducing Endgame Red Team Automation"
"title" : "Welcome Insight.io to the Elastic Team"
"title" : "Welcome Prelert to the Elastic Team"
. . .
match Query - Using AND Logic
- The operator parameter
- defines the logic to interpret text
- specify OR or AND
GET blogs/_search
{
"query": {
"match": {
"title": {
"query": "community team",
"operator": "and"
}
}
}
}
match Query - Return More Relevant Results
- The OR or AND options might be too wide or too strict
- Use the
minimum_should_matchparameter- specifies the minimum number of clauses that must match
- trims the long tail of less relevant results
GET blogs/_search
{
"query": {
"match": {
"title": {
"query": "elastic community team",
"minimum_should_match": 2
}
}
}
}
match Query - Searching for Terms
- The
matchquery does not consider- the order of terms
- how far apart the terms are
Request:
GET blogs/_search
{
"query": {
"match": {
"title": {
"query": "community team",
"operator": "and"
}
}
}
Response:
"title" : "Meet the team behind the Elastic
Community Conference"
match_phrase Query
- The
match_phrasesearches for the exact sequence of terms specified in the query- terms in the phrase must appear in the exact order
- use the
slopparameter to specify how far apart terms are allowed for it to be considered a match (default is 0)
Request:
GET blogs/_search
{
"query": {
"match_phrase": {
"title": "team community"
}
}
}
Request:
GET blogs/_search
{
"query": {
"match_phrase": {
"title":{
"query": "team community",
"slop": 3
}
}
}}
Searching Multiple Fields
- Use the
multi_matchquery- specify the comma-delimited list of fields using square brackets
GET blogs/_search
{
"query": {
"multi_match": {
"query": "agent",
"fields": [
"title",
"content"
]
}
}
}
multi_match and Scoring
- By default, the best scoring field will determine the score
- set
typetomost_fieldsto let the score be the sum of the scores of the individual fields instead
- set
GET blogs/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "agent",
"fields": [
"title",
"content"
]
}
}
}
Tip
The more fields contain the word “agent”, the highter the score.
multi_match and Phrases
- You can search for phrases with the multi_match query
- set
typetophrase
- set
GET blogs/_search
{
"query": {
"multi_match": {
"type": "phrase",
"query": "elastic agent",
"fields": [
"title",
"content"
]
}
}
}
The Response
Score
- Calculate a
scorefor each document that is a hit- ranks search results based on relevance
- represents how well a document matches a given search query
- BM25
- default scoring algorithm
- determines a document’s score using:
- TF (term frequency): the more a term appears in a field, the more important it is
- IDF (inverse document frequency): The more documents that contain the term, the less important the term is
- field length: shorter fields are more likely to be relevant than longer fields
Query Response
- by default the query response will return:
- the top 10 documents that match the query
- sorted by
_scorein descending order
GET blogs/_search
{
"from": 0,
"size": 10,
"sort": {
"_score": {
"order": "desc"
}
},
"query": {
...
}
}
Changing the Response
- Set
fromandsizeto paginate through the search results - Set
sortto sort on one or more fields instead of_score
GET blogs/_search
{
"from": 100,
"size": 50,
"sort": [
{
"publish_date": {
"order": "asc"
}
},
"_score"
],
"query": {
...
}
}
Note
Retrieves 50 hits, starting from hit 100.
Sorting
- Use keyword fields to sort on field values
- Results are not scored
_scorehas no impact on sorting_score: null
GET blogs/_search
{
"query": {
"match": {
"title": "Elastic"
}
},
"sort": {
"title.keyword": {
"order": "asc"
}
}
}
Retrieve Selected Fields
- By default, each hit in the response includes the document’s
_source- the original data that was passed at index time
- Use
fieldsto only retrieve specific fields
GET blogs/_search
{
"_source": false,
"fields": [
"publish_date",
"title"
]
...
}
Term-level Queries
Matching Exact Terms
- Recall that full text queries are analyzed and then searched within the index
flowchart LR
A[Full Text Query] --> B[<b>Analyzes</b> the query text<br>before the terms are looked<br>up in the index]
- Term-level queries are used for exact searches
- term-level queries do not analyze search terms
- Returns the exact match of the original string as it occurs in the documents
flowchart LR
A[Term-level Query] --> B[<b>Does not analyze</b> the<br>query text before the terms<br>are looked up in the index]
Term-level Queries
- Find documents based on precise values in structured data
- queries are matched on the exact terms stored in a field
| Full text queries | Term-level queries | Many more |
|---|---|---|
| match | term | script |
| match_phrase | range | percolate |
| multi_match | exists | span_queries |
| query_string | fuzzy | geo_queries |
| … | regexp | nested |
| wildcard | … | |
| ids | ||
| … |
Matching on a Keyword Field
- use the
keywordfield to match on an exact term- the term must exactly match the field value, including whitespace and capitalization
- Recall that keyword types are commonly used for:
- structured content such as IDs, email, hostnames, or zip codes
- sorting and aggregations
GET blogs/_search
{
"query": {
"term": {
"authors.job_title.keyword": "Senior Software Engineer"
}
}
}
range Query
- Use the following parameters to specify a range:
- gt: greater than
- gte: greater than or equal to
- lt: less than
- lte: less than or equal to
- Ranges can be open-ended
GET blogs/_search
{
"query": {
"range": {
"publish_date": {
"gte": "2023-01-01",
"lte": "2023-12-31"
}
}
}
}
Date Math
- Use date math to express relative dates in range queries
| y | years |
| M | months |
| w | weeks |
| d | days |
| h / H | hours |
| m | minutes |
| s | seconds |
Example for now = 2023-10-19T11:56:22
| now-1h | 2023-10-19T10:56:22 |
| now+1h+30m | 2023-10-19T13:26:22 |
| now/d+1d | 2023-10-20T00:00:00 |
| 2024-01-15||+1M | 2024-02-15T00:00:00 |
range Query - Date Math Example
GET blogs/_search
{
"query": {
"range": {
"publish_date": {
"gte": "now-1y"
}
}
}
}
exists Query
- Returns documents that contain an indexed value for a field
- Empty strings also indicate that a field exists
Example “How many documents have a category?”:
GET blogs/_count
{
"query": {
"exists": {
"field": "category"
}
}
}
Async Search
- Searches asynchronously
- Useful for slow queries and aggregations
- monitor the progress
- retrieve partial results as they become available
Request:
POST blogs/_async_search?wait_for_completion_timeout=0s
{
"query": {
"match": {
"title": "community team"
}
}
}
Response:
{
"id" : "Fk0tWm1LM1hmVHA2bGNvMHF6alRhM3ccZWZ0Uk9NcFVUR3VDTzc3OENmYUcyQToyMDYyMQ==",
"is_partial" : true,
"is_running" : true,
"start_time_in_millis" : 1649075069466,
"expiration_time_in_millis" : 1649507069466,
"response" : {
...
"hits" : {
"total" : {
"value" : 0,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
}
}
}
Note
The id can be used to retrieve the results later.
is_partial indicates whether the current set of results is partial.
is_running indicates whether the query is still running.
You can retrieve the results until expiration_time_in_millis.
Retrieve the Results
- Use the
idto retrieve search results - The response will tell you whether
- the query is still running (is_running)
- the results are partial (is_partial)
Request:
GET/_async_search/Fk0tWm1LM1hmVHA2bGNvMHF6alRhM3ccZWZ0Uk9NcFVUR3VDT...
Combining Queries
Combining Queries using Boolean Logic
- Suppose you want to write the following query:
- find blogs about “agent” written in english
- This search is actually a combination of two queries
- “agent” needs to be in the
contentortitlefield - and “en-us” in the
localefield
- “agent” needs to be in the
- How can you combine these two queries?
- by using Boolean logic and the
boolquery
- by using Boolean logic and the
bool Query
- The
boolquery combines one or more boolean clauses:- must
- filter
- must_not
- should
- Each of the clauses is optional
- Clauses can be combined
- Any clause accepts one or more queries
GET blogs/_search
{
"query": {
"bool": {
"must": [ ... ],
"filter": [ ... ],
"must_not": [ ... ],
"should": [ ... ]
}
}
}
must Clause
- Any query in a
mustclause must match for a document to be a hit - Every query contributes to the score
GET blogs/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "agent"
}
},
{
"match": {
"locale": "en-us"
}
}
]
}
}
}
filter Clause
- Filters are like
mustclauses: any query in afilterclause has to match for a document to be a hit - But, queries in a
filterclause do not contribute to the score
GET blogs/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "agent"
}
}
],
"filter": [
{
"match": {
"locale": "en-us"
}
}
]
}
}
}
Tip
Filters are great for yes / no type queries.
must_not Clause
- Use
must_notto exclude documents that match a query - Queries in a
must_notclause do not contribute to the score
GET blogs/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "agent"
}
}
],
"must_not": [
{
"match": {
"locale": "en-us"
}
}
]
}
}
}
should Clause
- Use
shouldto boost documents that match a query- queries in a
shouldclause contribute to the score - documents that do not match the queries in a
shouldclause are returned as hits too
- queries in a
- Use
minimum_should_matchto specify the number of percentage of should clauses returned
GET blogs/_search
{
"query": {
"bool": {
"must": [
{"match": {"content":"agent"}}
],
"should": [
{"match":{"locale": “en-us"}},
{"match":{"locale": “fr-fr"}}
],
"minimum_should_match": 1
}
}
}
Comparing Query and Filter Contexts
classDiagram
note for A "Query Context"
note for B "Filter Context"
class A["must<br>should"]
A : calculates a score
A : slower
A : no automatic caching
class B["filter<br>must_not"]
B : skips score calculation
B : faster
B : automatic caching for frequently used filters
Query vs. Filter Context - Ranking
Query Context
Request:
"bool": {
"must": [
{"match": {"title": "community"} }
]
}
Response:
"hits" : {
"total" : {
"value" : 28,
"relation" : "eq"
},
"max_score" : 6.1514335,
"hits" : [
Filter Context
Request:
"bool": {
"filter": [
{"match": {"title": "community"} }
]
}
Response:
"hits" : {
"total" : {
"value" : 28,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
bool Query Summary
| Clause | Exclude docs | Scoring |
|---|---|---|
| must | YES | YES |
| must_not | YES | NO |
| should | NO | YES |
| filter | YES | NO |