Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

The Details: ETL

A deep dive on how (tools, methods, use cases) companies move data back and forth.

Last updated Jul 4, 2025analytics
Justin Gage
Justin Gage
Read within learning track:Working With Data Teams

Refresher: what’s ETL

ETL is the process of moving and transforming data to get it into a more useful format. 

Every company has this idealized vision of a data science and analytics team, with full visibility into how the business is doing, how the product gets used, how experiments are performing, super good looking and funny people, etc. The problem with getting there (and this is part of why data teams don’t get hired until later in the company lifecycle) is that the actual, cold hard data that you need to answer important questions typically lies all over the place. And it needs cleaning. 

I’ve decided to illustrate this principle with a sadly very real set of examples from my experience:

Loading image...

All of this data sits in different systems - and what that means in practice is that you can’t just write a big SQL query that joins all of it together. You need to get it out of the source system and into a big fat warehouse so you can ask the questions you want. And what compounds this problem is that the value of this data is multiplied when it’s integrated, e.g. you can combine these sources together:

  • Website engagement + billing = which site pages drive signups who end up paying us the most money?
  • Paid marketing + Salesforce = which ad channels are driving the most revenue and large customers?
  • Product usage + Salesforce + billing = what features do our biggest customers use the most? What are early product signals of high engagement?

For those strategy minded citizens among us, you’ll notice that each of the vendors listed in the above table are actually incentivized to help you analyze the data they’re generating in their own tools and not out of them. So Stripe has Sigma, Salesforce has Dashboards, Google Analytics is a combined tracking tool and analytics tool, etc. They’ll let you get data out, but they want you to spend as much time in their software as possible, so they’ll make it easy to analyze it in place. 

So ETL - Extract, Transform, Load - is the process of getting siloed data out of source systems and into a central data warehouse, all together, where it can be analyzed easily and quickly. 

  • Extract: get data out of siloed source systems like Salesforce, Stripe, etc.
  • Transform: clean dirty data, reorganize it in more useful formats, join data together
  • Load: put the results of all of this into a data warehouse

This is a ubiquitous process - if you’re a company and you want to get value out of your data, you’re going to need to do some ETL sooner or later. And the canonical unit of ETL - one ETL job - can be as small as a few lines of SQL, or as big as a 20 step Python workflow.

Terms Mentioned

Open Source

SQL

Framework

Analytics

Data warehouse

ETL

Database

Operating System

Query

Companies Mentioned

dbt Labs logo

dbt Labs

PRIVATE
Stripe logo

Stripe

PRIVATE

Use cases / how it happens

Each job has (basically) 3 things that you need to worry about:

  1. A query (in SQL, Python, etc.) - the actual code to get the data, transform it, etc.
  2. A schedule -when the job should run (e.g. every day at 12PM)
  3. A dependency list or graph - what other ETL jobs need to run first

Unfortunately no catchy acronym here.

A basic example

Let’s walk through a very basic ETL job that aggregates the amount that we’ve charged each one of our customers in Stripe. Starting with the query, here’s the actual mechanics of what we want to do with source Stripe data: aggregate the total number of charges, and the total amount, per user. 

SELECT  
user_id,
SUM(amount),
COUNT(*)
FROM users u
LEFT JOIN stripe.charges c ON u.user_id = c.user_id
GROUP BY 1

After we’ve got the query written, we need to figure out a schedule (let’s say we want this to run twice a day) and then identify if there are any dependencies for the query - since this just pulls directly from source Stripe data, there are none.

A more complex example

The query is really the hard part, in my experience - some of the queries I’ve built for ETL jobs can be hundreds of lines long. A good example is building state from events. A lot of product data exists in event formats, where every time a user does something like clicking on a button, a row gets added to the database that says that they, well, clicked a button.

Continue reading with an all-access subscription

Continue reading with all-access

In this post

  • A more complex example
  • Tooling and software
  • Cron
  • Airflow
  • dbt

More in this track

How do product analytics work?

Product analytics is how teams instrument and analyze data about their product usage.

What your data team is using: the analytics stack

A deep dive into all of the tools that data teams use to do their work.

$15/month

30-day money-back guarantee

Or use
Up Next
What does dbt do?Paid Plan

dbt (no capitals) is a tool for transforming and organizing data in your warehouse.

What's Kafka and what does Confluent do?Paid Plan

Apache Kafka is a framework for streaming real time data, and Confluent offers Kafka as a managed service.

What does Segment do?Paid Plan

Segment helps teams track their product and marketing data and send it to whichever tools it needs to go to.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.