Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

What's ETL?

The process of moving data around your internal systems to get it ready for analysis.

Last updated Mar 23, 2026analytics
Justin Gage
Justin Gage
Read within learning track:Working With Data Teams

The TL;DR

ETL stands for Extract - Transform - Load, and it’s the process of moving data around your internal systems to get it ready for analysis. 

  • Data at most companies is siloed, dirty, and hard to work with: not ideal for analyzing it quickly and effectively
  • ETL is the process of moving, cleaning, and arranging that data in a way that makes your Data Science team happy (well, as happy as they’ll ever be)
  • Scheduling is one of the most frustrating parts of ETL, and an ecosystem of tools has popped up to make it easier

You’ve probably heard that 70% of Data Science is just moving and cleaning data, and nobody gets too excited about that. But ETL powers analysis at pretty much every company out there, so it’s worth understanding.

Terms Mentioned

SQL

Cloud

Framework

Data warehouse

ETL

Database

Companies Mentioned

Stripe logo

Stripe

PRIVATE

Background: the SDH problem

They say (well, I say) that all software is bad and complicated, but some is useful: the same thing is true about data. Most data is subject to what I call SDH: it’s siloed, dirty, and hard to work with.

Siloed

Because of how applications are architected, data tends to be stuck in the system that generates it. If you’re using Stripe for payments, your payment data is in the Stripe system. If you’re using Salesforce for your CRM, your sales data is in the Salesforce system. And if you’re using a PostgreSQL database to back your application, your user data is yet another system (yours). 

The problem is that for analysis – both for operational dashboards and longer term Data Science™ – the value comes from the integration of data, or being able to work with your payments, sales, and user data all in one place. You need to get it out of these siloed systems and into a central location.

Dirty

Data is dirty. Things are spelled incorrectly, some payment amounts are USD and others CAD, some engineer changed your storage format a year ago without telling anyone...the list goes on. You need to clean this data before analyzing it, and ideally do that on a regular basis. 

Hard to work with

Data has a format: it can be events data, user data, aggregated click counts, or any other form of captured reality. And the format that your data gets generated in is rarely the format you’re going to analyze it in. You need to transform it into something useful by aggregating, pivoting, filtering, and all of that fancy stuff you learn how to do in basic Excel classes. 

By now, you get it; we need to do a bunch of stuff to data before we can analyze it. That’s exactly what ETL is – it’s taking care of the SDH problem.

What’s ETL?

ETL stands for Extract - Transform - Load:

Continue reading with an all-access subscription

Continue reading with all-access

In this post

  • What’s ETL?
  • Extract
  • Transform
  • Load
  • ETL in practice: tools and scheduling

More in this track

How do product analytics work?

Product analytics is how teams instrument and analyze data about their product usage.

What your data team is using: the analytics stack

A deep dive into all of the tools that data teams use to do their work.

$15/month

30-day money-back guarantee

Or use
Up Next
The Details: ETL

A deep dive on how (tools, methods, use cases) companies move data back and forth.

What does dbt do?

dbt (no capitals) is a tool for transforming and organizing data in your warehouse.

What's Kafka and what does Confluent do?

Apache Kafka is a framework for streaming real time data, and Confluent offers Kafka as a managed service.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contribute
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.