Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction Elasticsearch Engineer

Stack Introduction

Elasticsearch Platform

Out-of-the-Box Solutions

  • Elastic Observability
  • Elastic Security

Build your own

  • Elastic Search

Elasticsearch AI Platform

  • Ingest and Secure Storage
  • AI / ML and Search
  • Visualization and Automation
Kibana
  • Explore
  • Visualize
  • Engage
Elasticsearch
  • Store
  • Analyze
  • Machine Learning
  • Generative AI
Integrations
  • Connect
  • Collect
  • Alert

Elasticsearch Data Journey

Collect, connect, and visualize your data from any source.

flowchart LR

    subgraph Data
        A[Data]
    end

    subgraph Ingest
        B[Beats]
        C[Logstash]
        D[Elastic Agent<br>Integrations]
    end

    subgraph Store
        E[Elasticsearch]
    end

    subgraph Visualize
        F[Kibana]
    end

    A --> B & C & D
    B --> C
    B & C & D--> E
    E --> F

Elasticsearch is a Document Store

  • Elasticsearch is a distributed document store
  • Documents are serialized JSON objects that are:
    • stored in Elasticsearch under a unique Document ID
    • distributed across the cluster and can be accessed immediately from any node

Kibana

  • Kibana is a front-end app that sits on top of the Elastic Stack
  • It provides search and data visualization capabilities for data in Elasticsearch

Exploring and Querying Data with Kibana

  • Start with Discover
    • Create a data view to access your data
    • Explore the fields in your data
    • Examine popular values
    • Use the query bar and filters to see subsets of your data

Installation Options

Elastic Cloud

  • Elastic Cloud Hosted
  • Elastic Cloud Serverless

Elastic Self-Managed

  • Elastic Stack
  • Elastic Cloud on Kubernetes
  • Elastic Cloud Enterprise

Index Operations

Documents are Indexed into an Index

  • In Elasticsearch a document is indexed into an index
  • An index:
    • is a logical way of grouping data
    • can be thought of as an optimized collection of documents
    • is used as a verb and a noun

Index a Document: curl Example

  • To create an index, send a request using POST that specifies:
    • index_name
    • _doc resource
    • document
  • By default, Elasticsearch generates the ID for you
$ curl -X POST "localhost:9200/my_blogs/_doc" -H 'Content-Type: application/json' -d'
{
    "title": "Fighting Ebola with Elastic",
    "category": "Engineering",
    "author": {
        "first_name": "Emily",
        "last_name": "Mosher"
} } '

Index a Document: Dev Tools > Console

  • Console providing Elasticsearch & Kibana REST interaction
  • User-friendly interface to create and submit requests
  • View API docs

Index a Document: PUT vs. POST

  • When you index a document using:
    • PUT: you pass in a document ID with the request if the document ID already exists, the index will be updated and the _version incremented by 1
    • POST: the document ID is automatically generated with a unique ID for the document

Request:

PUT my_blogs/_doc/6OCz5pEBqWhDYCLiWpe5
{
    "title" : "Fighting Ebola with Elastic",
    "category": "User Stories",
    “Author” : {
        “first name” : “Emily”,
        “last name” : “Mosher”
        }
}

Response:

{
    "_index" : "my_blogs",
    "_type" : "_doc",
    "_id" : "6OCz5pEBqWhDYCLiWpe5",
    "_version" : 2,
    "result" : "updated",
    ...
}

Retrieve a Document

  • Use a GET request with the document’s unique ID

Request:

GET my_blogs/_doc/6OCz5pEBqWhDYCLiWpe5

Response:

{
    ...
    "_id" : "6OCz5pEBqWhDYCLiWpe5",
    "_source": {
        "title": "Fighting Ebola with Elastic",
        "category": "User Stories",
        "author": {
            "first_name": "Emily",
            "last_name": "Mosher"
        }

Create a Document

  • Index a new JSON document with the _create resource
    • guarantees that the document is only indexed if it does not already exist
    • can not be used to update an existing document

Request:

POST my_blogs/_create/4
{
    "title" : "Fighting Ebola with Elastic",
    "category": "Engineering",
    “Author” : {
        “first name” : “Emily”,
        “last name” : “Mosher”
        }
}

Response:

{
    "_index" : "my_blogs",
    "_type" : "_doc",
    "_id" : "4",
    "_version" : 1,
    "result" : "created",
    ...
}

Update Specific Fields

  • Use the _update resource to modify specific fields in a document
    • add the doc context
    • _version is incremented by 1

Request:

POST my_blogs/_update/4
{
    "doc" : {
        "category": "User Stories"
    }
}

Response:

{
    "_index" : "my_blogs",
    "_type" : "_doc",
    "_id" : "4",
    "_version" : 2,
    "result" : "updated",
    ...
}

Delete a Document

  • Use DELETE to delete an indexed document

Request:

DELETE my_blogs/_doc/4

Response:

{
"_index": "my_blogs",
    "_type": "_doc",
    "_id": "4",
    "_version": 3,
    "result": "deleted",
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "_seq_no": 3,
    "_primary_term": 1
}

Cheaper in Bulk

  • Use the BULK API to index many documents in a single API call
    • increases the indexing speed
    • useful if you need to index a data stream such as log events
  • Four actions
    • create, index, update, and delete
  • The response is a large JSON structure
    • returns individual results of each action that was performed
    • failure of a single action does not affect the remaining actions

Bulk API Example

  • Newline delimited JSON (NDJSON) structure
    • increases the indexing speed
    • index, create, update actions expect a newline followed by a JSON object on a single line

Example:

POST comments/_bulk
{"index" : {}}
{"title": "Tuning Go Apps with Metricbeat", "category": "Engineering"}
{"index" : {"_id":4}}
{"title": "Elasticsearch Released", "category": "Releases"}
{"create" : {"_id":5}}
{"title": "Searching for needle in", "category": "User Stories"}
{"update" : {"_id":2}}
{"doc": {"title": "Searching for needle in haystack"}}
{"delete": {"_id":1}}

Upload a File in Kibana

  • Quickly upload a log file or delimited CSV, TSV, or JSON file
    • used for initial exploration of your data
    • not intended as part of production process

Understanding Data

  • Most data can be categorized into:
    • (relatively) static data: data set that may grow or change, but slowly or infrequently, like a catalog or inventory of items
    • times series data: event data associated with a moment in time that (usually) grows rapidly, like log files or metrics
  • Elastic Stack works well with either type of data

Searching Data

Different Use Cases

  • Search
    • Typically uses human generated, error-prone data
    • Often uses free-form text fields for anybody to type anything
  • Observability:
    • Need to analyze HUGE amounts of data in real-time
    • Ingest load can vary
  • Security:
    • Collect data from MANY different sources with different data formats

Query Languages

Several to choose from:

  • KQL
  • Lucene
  • ES|QL
  • Query DSL
  • Elasticsearch SQL
  • EQL
  • In Elasticsearch, search breaks down into two basic parts:
    • Queries
      • Which documents meet a specific set of criteria?
    • Aggregations
      • Tell me something about a group of documents

Using Query DSL

  • Send a request using the search API:
    • GET <index>/_search

match_all query

  • is the default request for the search API
    • Every document is a hit for this search
    • Elasticsearch returns 10 hits by default

Aggregations

  • Visualizations on a Kibana dashboard are powered by aggregations

Aggregating Data

Request:

GET blogs/_search
{
  "aggs": {
    "first_blog": {
      "min": {
        "field": "publish_date"
      }
    }
  }
}

Response:

{
  ...
  "aggregations": {
    "first_blog": {
      "value": 1265658554000,
      "value_as_string": "2010-02-08T19:49:14.000Z"
    }
  }
}

ES|QL

  • A piped query language that delivers advanced search capabilities
    • Streamlines searching, aggregating, and visualizing large data sets
    • Brings together the capabilties of multiple languages (Query DSL, KQL, EQL, Lucene, SQL, …)
  • Powered by a dedicated query engine with concurrent processing
    • Designed for performance
    • Enhances speed and efficiency irrespective of data source and structure

Query

  • Composed of a series of commands chained together by pipes

Running an ES|QL Query in Dev Tools

  • Wrap the query in a POST request to the query API
    • By default, results are returned as a JSON object
    • Use the format option to retrieve the results in alternative formats

Request:

POST /_query
{
"query": "FROM blogs | KEEP publish_date, authors.full_name | SORT (publish_date)"
}

Request with format:

POST /_query?format=csv
{
  "query": """
      FROM blogs
        | KEEP publish_date, authors.first_name, authors.last_name
        | SORT (publish_date)
  “””
}

Running an ES|QL Query in Discover

  1. Select Language ES|QL in the Data View pull-down
  2. Expand the query editor to enter multiline commands
  3. Click the Run button or type command/alt-Enter to run the query

Examples

FROM blogs
| KEEP publish_date, authors.first_name, authors.last_name
FROM blogs
| WHERE authors.last_name.keyword == "Kearns"
| KEEP publish_date, authors.first_name, authors.last_name
FROM blogs
| STATS count = COUNT(*) BY authors.last_name.keyword
| SORT count DESC
| LIMIT 10