Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

What does Databricks do (circa 2020)?

Databricks sells a data science and analytics platform built on top of an open source package called Apache Spark.

Last updated May 28, 2026analytics
Justin Gage
Justin Gage

The TL;DR

Databricks sells a data science and analytics platform – i.e. a place to query and share data – built on top of an open source package called Apache Spark. 

  • Apache Spark is an open source engine for running analytics and machine learning across distributed, giant datasets
  • Spark is notoriously hard to run on your own infrastructure and companies often don’t have the expertise to do that
  • Databricks provides a managed service for running Spark clusters, as well as notebooks for visualization and exploration, plus the ability to schedule pipelines
  • More recently, Databricks has been expanding the product portfolio to include ML and data warehousing

Databricks is one of the largest private companies on the planet - $62B was their most recent valuation.

Terms Mentioned

Open Source

Server

Cloud

Framework

Infrastructure

Production

Backend

API

Data lake

Analytics

Data warehouse

Deploy

Machine Learning

Query

Companies Mentioned

Databricks logo

Databricks

PRIVATE
AWS logo

AWS

AMZN

The Databricks core product: managed spark

Let’s start with Spark. Apache Spark is a tool for running distributed data pipelines (think: query this, move this to that place). As teams started storing more and more data than ever before, it stopped making sense to put all of it on a single server – so Spark distributes this data and compute across multiple servers, making everything faster and more efficient.

But, distributed systems are very, very complicated (and not just in the data realm). This isn’t the kind of thing that your typical software engineer is going to be comfortable configuring and setting up from scratch. So setting up a Spark “cluster” (a group of servers) is pretty difficult.

And that’s where Databricks comes in. They provide a fully managed Spark environment so you can focus on writing queries and pipelines instead of managing infrastructure. You also get a notebook-like interface to write Spark jobs (like that Python code we saw above) and make nice graphs.

Loading image...

Apache Spark, the OG

Since Databricks is built on top of this open source “Spark” thing, understanding Databricks means understanding Spark. So what’s Apache Spark exactly?

Continue reading with an all-access subscription

Continue reading with all-access

In this post

  • Apache Spark, the OG
  • The core Databricks product
  • 1) A fully managed Spark cluster
  • 2) An interactive workspace for exploration and visualization
  • 3) A production pipeline scheduler
$15/month

30-day money-back guarantee

Or use
Up Next
What your data team is using: the analytics stack

A deep dive into all of the tools that data teams use to do their work.

Justin GageJustin Gage
analytics
What's the Modern Data Stack?

The new set of tools data teams use to get their jobs done.

Justin GageJustin Gage
analytics
What's a Data Lake?

A Data Lake is an unstructured place to put data.

Justin GageJustin Gage
analytics
Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contribute
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn📞Call for advice
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.