This is all to say that Databricks is no longer just The Spark Company™. Over time, they’ve become simply a one stop shop for everything related to a company’s data, from training models to storing data to building pipelines with it. Like Snowflake, they are positioning themselves as the all-in-one “data universe” where anything you’d conceivably want to do that involves any sort of data can be done, no other vendors required. And, of course, AI stuff.
What does this all-in-one magic playland consist of? One could break it up into 4 categories:
- Storing data
- Moving data
- Analyzing data
- AI stuff
I’ll go through each of these in more depth so we can figure out what this company actually does.
The thing that you have to understand about all of this, before we dive in, is that Databricks has quite literally dozens of SKUs. These are just the high-level categories, each containing several SKUs within:
Remember, anon, most of these do not matter. They are the corporate equivalent of lovebombing, aimed at overwhelming the Fortune 500 customer with so many gadgets, names, and features that they can’t help but think “wow, Databricks has it all.” Because that is why the Fortune 500 customer buys Databricks in the first place: it has it all.
Moving data#
This is the section I’ll spend the least time on, since it was covered in depth the first time I wrote about Databricks. This part of the business is essentially unchanged. Engineers need to move data from place to place, all the while transforming and cleaning it to make it useful. Databricks does this for you by taking a popular open source framework – Apache Spark – and making it a lot easier to use, plus eliminating the need for you to manage your own infrastructure.
Storing data#
This is an area where the Databricks product offering has evolved quite a bit since 2020. At the time, they were toting this sort of experimental thing called the “Lakehouse” – it was a combination of a data lake and a data warehouse that was supposed to be faster and cheaper than each individually:
Moving more into the analytics and BI realm, Databricks recently released a pretty interesting solution that lets you query your data lake as if it were a data warehouse. For a quick refresher, data lakes are big, unstructured places for you to store raw data really cheaply, while warehouses are for structured data that needs to be queried quickly. The new Delta Lake product purports to give you data warehouse speeds when querying your data lake, so you can keep storage costs really low.
Fast forward to today, and they are marketing it explicitly as a Snowflake-competitive data warehouse:
Just based on documented features alone, this product has come a long way since 2020 and is now an actually legitimate place to store your analytical data. If you believe surveys like this, a majority of “IT leaders” (?) plan on moving to these kinds of Lakehouse architectures over time.
As of 2025, it is not only analytical data that you can store in Databricks. Following their $1B purchase of managed database company Neon in May, Databricks now offers a production data store as well. Recall that production databases like Postgres power the day to day operations of your business and apps, while analytical data is more for analysis, training AI models, data science, things like that. I wrote about why Databricks might have made this acquisition a few months ago, if you’re curious.
Remember, all of these things feed into each other. Moving data in Databricks is easier if you store your data in Databricks. Both make it easier to analyze your data. Speaking of which:
Analyzing data#
This was another foundational area for Databricks in 2020 that I probably should have focused more on in my post. Here we are talking about Data Science: prepping, analyzing, and modeling data to help the business make better decisions (or even power product experiences, probably not but I said it). Anecdotally, several of the engineers in my network who use Databricks mostly think about it through the analyzing data lens.
Databricks provides a fully managed experience for doing day to day data science work. There are a few SKUs worth mentioning here:
- Notebooks, essentially an IDE for data science. Here you can write your Python/R/Scala code, see interactive visualizations, share with your teammates, all of those things.
- Dashboards, where you can build and pin recurring views that are important to the business, like revenue, or things that exist solely to torture your marketing department, like MQLs.
- Apps, so you can turn your data into interactive apps that your whole team can use. So they can just adjust the date range on the chart instead of Slacking you to do it.
- All of this runs on servers they manage in the cloud, so you don’t need to deal with managing your own infrastructure.
The point is not that any of these products are unique (they’re not) or better than independent competitive alternatives (they’re not), but that all of this stuff is together, in one place. Where you store your data is where you move your data is where you analyze your data.
AI stuff#
The section you’ve all been waiting for, and if we’re being honest probably the bulk of why this latest valuation is what it is. Databricks is an AI company, stupid. In fact here is their CEO’s tweet explaining why they raised this most recent round:
The current lens they want you to view their AI stuff through is that of agents. Which is, of course, a meaningless hype word, but let’s essentially call them somewhat autonomous pieces of software that use AI to do things for your customers. The quintessential example here is the customer support bot, etc.
So what do you get from Databricks to build these agents? Well they’ve quickly assembled a medley of minimum viable AI products:
- A solution for training and fine-tuning custom AI models (built on top of their purchase of Mosaic in 2023)
- A solution for deploying those models so customers can actually use them.
- Their “Agent Framework” which is essentially just their RAG feature. This helps you build better models by getting more of your data into them.
- Their Vector Search, because remember, every single company will eventually sell you a vector database.
- The rest of the Databricks platform that helps here too: storage, data prep / pipelines, notebooks, etc.
And then there’s this Agent Bricks thing. It’s confusingly marketed but I believe the best way to understand it is as an “agent platform” – essentially a lot of little tools that you might need to build a particular kind of agent.
I love this janky video demo they put together where you can hear the cars rumbling in the background. If anyone from Databricks is reading this, please fix the vertical alignment here.
So what does Databricks do?#
I believe the question to be incorrect. What we should really be asking is “What does Databricks not do?”