Splunk helps make sense of logs#
A story as old as time, and one told by Technically Inc. many times before (i.e. New Relic, Datadog, JFrog), Splunk started with a specific log analysis product, but has since expanded into adjacent product areas in the name of providing a "platform" i.e. lots of products that help keep customers locked in.
They've also built an impressive ecosystem around the product, with ~2,800 apps and add-ons available on their Splunkbase marketplace (the naming department was on vacation). Most of these are built by third parties – for example, the "Python for Scientific Computing" add-on lets you write Python on your Splunk data. Large ecosystems around infrastructure products tend to make a platform more valuable and customers less likely to leave; Snowflake has pursued the exact same strategy with the Snowflake Marketplace.
The Splunk core product – ingest and analyze logs#
At its core, Splunk is about making sense of giant piles of logs. Because it’s a large, only loosely developer-focused company today, I was only able to find the following plain english description after several clicks through their documentation:
“Splunk Enterprise is a software product that enables you to search, analyze, and visualize the data gathered from the components of your IT infrastructure or business. Splunk Enterprise takes in data from websites, applications, sensors, devices, and so on.”
And here is where we begin our journey. Remember: if you ever want to cut through the we hope CIOs are reading this! homepage copy, go straight to the documentation. When the company needs to talk to developers, they cut out the marketing and get real.
In their docs, Splunk breaks down the core logs analytics product into 7 parts, but we here at Technically Inc. think 2 simpler categories suffice:
- Logs infrastructure – getting data into Splunk and organizing it effectively.
- Analysis and visualization – searching through logs, building visualizations and dashboards, scheduling regular reports, and configuring alerts.
Though the first category may not seem very interesting or important, consider that Splunk’s target user is a team of developers at a huge (enterprise) company. And at that scale, they’re ingesting logs from tons of different sources at a breakneck speed. Figuring out how to get that data into a central location, index it, choose a schema, etc. is not straightforward.
Tesco, a massive food retailer outside the U.S., is one of Splunk's customers – they use it to ingest logs from all of their devices (point of sale, warehouse, trucking, etc.) and build dashboards and alerting.
And they're not alone - Splunk's products have been adopted by over 90% of Fortune 100 companies .
Remember though, under the hood is unstructured log data, not neatly organized database tables. That’s part of what makes this product so useful, and hard to build in house. This security-focused use case is one of Splunk’s most popular, and is often called SIEM, if you’re interested in another acronym.
To build a visualization set of them (dashboard = set of visualizations) in Splunk, the usual entry point is to go through the search UI:
Splunk has a native search language they developed called Search Processing Language or SPL for short. That text on the top there – categoryid=sports – is an SPL query that filters ingested logs for that specific string of text. Splunk shows you the results on the right there, with the matches from your query highlighted. Notice how they look a lot like the simple example logs we talked about earlier.
Clicking on the visualization tab lets you build graphs from this data: maybe the number of results over time, the percentage of logs coming from specific devices, etc. Since these logs could contain data about literally anything – security and access events, web page visits, device updates and restarts – there’s a pretty broad array of use cases that companies use the product for.
The Splunk ecosystem – some other stuff#
Splunk has slowly been building out the platform to keep customers locked in. A few examples:
- Machine Learning – apply ML models to your log data to identify anomalies and outlier events, forecast more effectively, etc.
- Security Orchestration – lets teams run security workflows (disable this, pull that data, etc.) on top of their logs
- On Call Automation – lets teams run DevOps workflows to remediate when servers or applications go down
There’s an interesting pattern emerging here. Splunk has a few core use cases – namely security and DevOps – and they’ve built automation suites on top of those. In other words, they may be trying to become not only a place where you analyze and visualize your data, but also a place where you can take action on what you’re seeing in that data. Here’s their admittedly complex diagram attempting to explain their On-Call product:
Normally, you’d sift through logs and get some sort of alert in Splunk that a server is down; then you’d find your way into other tools (like PagerDuty or Slack) to let your team know, create a ticket in a task management system that remains open until the issue is fixed, and notify stakeholders. Splunk wants you to do all of that in their product – which is probably a strategic move to increase their surface area and become more than just an analysis and alerting tool.