Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

Databricks is apparently worth $100B. What do they even do?

What we should really be asking is “What does Databricks not do?”

Last updated Jun 18, 2026analytics
Justin Gage
Justin Gage
Read within learning track:

If you’re like me, you are probably a little bit confused about Databricks.

To the innocent onlooker, it sometimes feels like they are constantly announcing some new fundraising round, multiple times per year, each with a comically larger valuation than the next. Their most recent of these comically large valuations is $100B (yes, one hundred billion), making them one of the 5 most valuable private companies in the world. At this point, Databricks has raised so many funding rounds that they are running out of letters in the alphabet to designate them (this is their Series K, by the way).

All of this begs the obvious question. What does Databricks actually do?

Terms Mentioned

Training

Open Source

Cloud

Framework

Infrastructure

Production

Data lake

Analytics

Data warehouse

Machine Learning

Database

Vector Database

Query

Companies Mentioned

OpenAI logo

OpenAI

PRIVATE
Databricks logo

Databricks

PRIVATE
Snowflake logo

Snowflake

SNOW

It’s not like their website headline clears things up at all.

Loading image...

What is a data intelligence platform? Judging by the imagery, you’d be justified to believe Databricks somehow got a $100B valuation by selling these:

Loading image...

The magic eight ball says: don’t count on it.

I first wrote about this frankly very odd company back in 2020, when they were only worth a paltry $6B. Here’s the TL;DR that I put atop the post:

Databricks sells a data science and analytics platform – i.e. a place to query and share data – built on top of an open source package called Apache Spark.

- Apache Spark is an open source engine for running analytics and machine learning across distributed, giant datasets

- Spark is notoriously hard to run on your own infrastructure and companies often don’t have the expertise to do that

- Databricks provides a managed service for running Spark clusters, as well as notebooks for visualization and exploration, plus the ability to schedule pipelines

- More recently, Databricks has been expanding the product portfolio to include ML and data warehousing

This is a pretty big company, all things considered - $6.2B was their most recent valuation, and they’re planning on going public in 2021.

A lot has changed since then (except the fact that Databricks is still private. Planning on going public in 2021. We all fell for that one). The world is awash in Generative AI. The entire corporate universe is uprooting their playbooks and shifting towards AI, OpenAI and Anthropic are worth hundreds of billions of dollars[1] By this all I mean is that someone was willing to buy their shares at this price. They are not “worth” hundreds of billions in the sense that a Rolex, or say a bar of gold, is “worth” their respective prices., and NVIDIA is selling GPUs faster than they can make them.

This is all to say that Databricks is no longer just The Spark Company™. Over time, they’ve become simply a one stop shop for everything related to a company’s data, from training models to storing data to building pipelines with it. Like Snowflake, they are positioning themselves as the all-in-one “data universe” where anything you’d conceivably want to do that involves any sort of data can be done, no other vendors required. And, of course, AI stuff.

What does this all-in-one magic playland consist of? One could break it up into 4 categories:

  1. Storing data
  2. Moving data
  3. Analyzing data
  4. AI stuff

I’ll go through each of these in more depth so we can figure out what this company actually does.

The thing that you have to understand about all of this, before we dive in, is that Databricks has quite literally dozens of SKUs. These are just the high-level categories, each containing several SKUs within:

Loading image...

Remember, anon, most of these do not matter. They are the corporate equivalent of lovebombing, aimed at overwhelming the Fortune 500 customer with so many gadgets, names, and features that they can’t help but think “wow, Databricks has it all.” Because that is why the Fortune 500 customer buys Databricks in the first place: it has it all.

Moving data#

This is the section I’ll spend the least time on, since it was covered in depth the first time I wrote about Databricks. This part of the business is essentially unchanged. Engineers need to move data from place to place, all the while transforming and cleaning it to make it useful. Databricks does this for you by taking a popular open source framework – Apache Spark – and making it a lot easier to use, plus eliminating the need for you to manage your own infrastructure.

Storing data#

This is an area where the Databricks product offering has evolved quite a bit since 2020. At the time, they were toting this sort of experimental thing called the “Lakehouse” – it was a combination of a data lake and a data warehouse that was supposed to be faster and cheaper than each individually:

Moving more into the analytics and BI realm, Databricks recently released a pretty interesting solution that lets you query your data lake as if it were a data warehouse. For a quick refresher, data lakes are big, unstructured places for you to store raw data really cheaply, while warehouses are for structured data that needs to be queried quickly. The new Delta Lake product purports to give you data warehouse speeds when querying your data lake, so you can keep storage costs really low.

Fast forward to today, and they are marketing it explicitly as a Snowflake-competitive data warehouse:

Loading image...

Just based on documented features alone, this product has come a long way since 2020 and is now an actually legitimate place to store your analytical data. If you believe surveys like this, a majority of “IT leaders” (?) plan on moving to these kinds of Lakehouse architectures over time.

As of 2025, it is not only analytical data that you can store in Databricks. Following their $1B purchase of managed database company Neon in May, Databricks now offers a production data store as well. Recall that production databases like Postgres power the day to day operations of your business and apps, while analytical data is more for analysis, training AI models, data science, things like that. I wrote about why Databricks might have made this acquisition a few months ago, if you’re curious.

Remember, all of these things feed into each other. Moving data in Databricks is easier if you store your data in Databricks. Both make it easier to analyze your data. Speaking of which:

Analyzing data#

This was another foundational area for Databricks in 2020 that I probably should have focused more on in my post. Here we are talking about Data Science: prepping, analyzing, and modeling data to help the business make better decisions (or even power product experiences, probably not but I said it). Anecdotally, several of the engineers in my network who use Databricks mostly think about it through the analyzing data lens.

Loading image...

Databricks provides a fully managed experience for doing day to day data science work. There are a few SKUs worth mentioning here:

  • Notebooks, essentially an IDE for data science. Here you can write your Python/R/Scala code, see interactive visualizations, share with your teammates, all of those things.
  • Dashboards, where you can build and pin recurring views that are important to the business, like revenue, or things that exist solely to torture your marketing department, like MQLs.
  • Apps, so you can turn your data into interactive apps that your whole team can use. So they can just adjust the date range on the chart instead of Slacking you to do it.
  • All of this runs on servers they manage in the cloud, so you don’t need to deal with managing your own infrastructure.

The point is not that any of these products are unique (they’re not) or better than independent competitive alternatives (they’re not), but that all of this stuff is together, in one place. Where you store your data is where you move your data is where you analyze your data.

AI stuff#

The section you’ve all been waiting for, and if we’re being honest probably the bulk of why this latest valuation is what it is. Databricks is an AI company, stupid. In fact here is their CEO’s tweet explaining why they raised this most recent round:

Loading image...

The current lens they want you to view their AI stuff through is that of agents. Which is, of course, a meaningless hype word, but let’s essentially call them somewhat autonomous pieces of software that use AI to do things for your customers. The quintessential example here is the customer support bot, etc.

Loading image...

So what do you get from Databricks to build these agents? Well they’ve quickly assembled a medley of minimum viable AI products:

  • A solution for training and fine-tuning custom AI models (built on top of their purchase of Mosaic in 2023)
  • A solution for deploying those models so customers can actually use them.
  • Their “Agent Framework” which is essentially just their RAG feature. This helps you build better models by getting more of your data into them.
  • Their Vector Search, because remember, every single company will eventually sell you a vector database.
  • The rest of the Databricks platform that helps here too: storage, data prep / pipelines, notebooks, etc.

And then there’s this Agent Bricks thing. It’s confusingly marketed but I believe the best way to understand it is as an “agent platform” – essentially a lot of little tools that you might need to build a particular kind of agent.

Loading image...

I love this janky video demo they put together where you can hear the cars rumbling in the background. If anyone from Databricks is reading this, please fix the vertical alignment here.

Loading image...

So what does Databricks do?#

I believe the question to be incorrect. What we should really be asking is “What does Databricks not do?”

Up Next
What does Twilio do?

Twilio makes a suite of products that helps you communicate with your customers via SMS, video, calls, and more.

What does UIPath do?

UIPath helps people automate rote manual tasks like updating spreadsheets and creating documents.

What does Zapier do?

Zapier is a tool that helps business people make custom integrations between their favorite tools, without needing to write any code.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contribute
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedInInstagram📞Call for advice
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.