↑ BACK TO TOP
technically logo

Learn

Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Knowledge Bases
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore knowledge bases

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...

Meet Technically

Technically exists to help you get better at your job by becoming more technically literate.

Learn more →

Solutions for Teams

For GTM Teams
Sell more software to developers by becoming technically fluent.
For Finance Professionals
Helping both buy-side and sell-side firms ask better technical questions.
General Team Inquiries
Volume discounts on Technically knowledge bases.
Loading...
Pricing
Sign In

What does Databricks do?

Databricks sells a data science and analytics platform built on top of an open source package called Apache Spark.

analytics

Last updated: July 4, 2025

The TL;DR

Databricks sells a data science and analytics platform – i.e. a place to query and share data – built on top of an open source package called Apache Spark. 

  • Apache Spark is an open source engine for running analytics and machine learning across distributed, giant datasets
  • Spark is notoriously hard to run on your own infrastructure and companies often don’t have the expertise to do that
  • Databricks provides a managed service for running Spark clusters, as well as notebooks for visualization and exploration, plus the ability to schedule pipelines
  • More recently, Databricks has been expanding the product portfolio to include ML and data warehousing

Databricks is one of the largest private companies on the planet - $62B was their most recent valuation.

Terms Mentioned

Open Source

Server

Cloud

Framework

Infrastructure

Production

Backend

API

Data lake

Analytics

Data warehouse

Deploy

Machine Learning

Query

Companies Mentioned

Databricks logo

Databricks

$PRIVATE
AWS logo

AWS

$AMZN
Databricks logo

Databricks

$PRIVATE

The Databricks core product: managed spark

Let’s start with Spark. Apache Spark is a tool for running distributed data pipelines (think: query this, move this to that place). As teams started storing more and more data than ever before, it stopped making sense to put all of it on a single server – so Spark distributes this data and compute across multiple servers, making everything faster and more efficient.

But, distributed systems are very, very complicated (and not just in the data realm). This isn’t the kind of thing that your typical software engineer is going to be comfortable configuring and setting up from scratch. So setting up a Spark “cluster” (a group of servers) is pretty difficult.

And that’s where Databricks comes in. They provide a fully managed Spark environment so you can focus on writing queries and pipelines instead of managing infrastructure. You also get a notebook-like interface to write Spark jobs (like that Python code we saw above) and make nice graphs.

Loading image...

Apache Spark, the OG

Since Databricks is built on top of this open source “Spark” thing, understanding Databricks means understanding Spark. So what’s Apache Spark exactly?

Support
Sponsorships
Twitter
Linkedin
Privacy + ToS