The core Elastic product: search
When you think of search, you probably think about Google. But on the engineering side, developers need to search through a lot of stuff, especially logs of what’s happening on their servers and apps. Elasticsearch, and the managed service for it that Elastic (the company) provides, is a database and search engine for doing just that.
Elasticsearch’s primary use cases revolve around things that commonly need, well, search. One big theme centers around infrastructure management, but teams also use it for security and even user facing search engines. It’s also most commonly used with what’s called the ELK stack, which is a series of adjacent tools that help you use Elasticsearch like Kibana and Logstash.
Another database? Some taxonomy
Yet another database!? Yes, my dear readers, another database. But Elasticsearch isn’t like other databases; it’s use case specific, meaning it was designed for doing specific things with particular types of data. One of its flagship features is also built-in search (hence the name), which is now becoming common in the NoSQL database world, but was novel when it first released. To understand any database, you first need to understand why teams use it, and it’s there we begin this installment of Technically.
OLTP vs. OLAP databases
Broadly speaking, there are two types of databases out there.
(1) The first category is used to power the apps that we know and love: they store information about us, our profiles, and any content related to us, like our Tweets on Twitter or our emails on Gmail. These are known as OLTP databases – an acronym for OnLine Transactional Processing – and they’re optimized for many small queries in quick succession with few joins. MySQL, PostgreSQL, Redis, and MongoDB are all (primarily) OLTP databases.
(2) The second category is used to store long term data and analyze it. That analysis can be business related – like wondering what revenue is this month – or operational, like figuring out which Kubernetes node is causing the app to keep crashing today. These are known as OLAP databases – an acronym for OnLine Analytical Processing – and they’re optimized for fewer, more complex queries with many joins. Snowflake, BigQuery, and Elasticsearch are all OLAP databases.
Elasticsearch fits into this latter category. Companies typically don’t use Elasticsearch as their primary data store backing their apps. It won’t store user information or anything mission critical to the actual app the company sells. It usually won’t interact with your web app directly. Instead, it’s primarily for storing performance-related data and analyzing it down the road.
While Elasticsearch is used as an OLAP database, some teams do use it to power (think: searching your emails or past tweets). This use case is somewhere in between OLTP and OLAP.
Structured vs. unstructured data
Data usually comes in two forms: structured and unstructured. Structured data is organized into familiar table structures, like you’d see in Excel, while unstructured can just be giant blobs of text or other similar data. A user in your production database is structured:
While a log that your server sent when there was an error can be unstructured, or even just a bunch of loose text:
[kafka.log][INFO] Retrying leaderEpoch request for partition logs-0 as the header reported an error: NOT_LEADER_FOR_PARTITION
Generally, SQL databases like MySQL or Snowflake are best for storing structured data (be it transactional or analytical), while NoSQL databases like MongoDB or Redis are best for storing unstructured data. Elasticsearch is an unstructured data store.
So with the above, admittedly rudimentary taxonomy in mind, and the understanding that Elasticsearch is an analytical database used for unstructured data, we can dive into what teams actually use it for.