What does dbt do?

dbt (no capitals) is a tool for transforming and organizing data in your warehouse.

analytics

Last updated: July 4, 2025

The TL;DR

dbt (no capitals) is a tool for transforming and organizing data in your warehouse. It helps data teams get raw data ready for analysis and impact.

Data models are the core of effective data teams – they map business concepts onto cleaned, organized data
dbt helps data teams use SQL to build useful, documented data models that the rest of the company can benefit from
Core concepts in dbt: models, docs, seeds, and runs
dbt isn’t quite like anything on the market, and they’ve partnered with tools across the spectrum

The open source dbt product has seen almost fanatical levels of support from the engineering and data community; they also recently raised a $150M Series C.

Terms Mentioned

Companies Mentioned

What’s a data model exactly?

🔮 Dependencies

Understanding dbt will be easier if you get comfortable with and . You’ll also want to be familiar with the concept of a .

dbt is quite simply a tool for building data models. If you’re on a data team, you probably know what that means. But alas, my dear audience, if you’re not, it may be unfamiliar. You’ve heard of machine learning models, but what’s a data model?

The age of the simple warehouse

First, the fundamentals. In a previous post, we talked about data integration:

Every company has this idealized vision of a data science and analytics team, with full visibility into how the business is doing, how the product gets used, how experiments are performing, super good looking and funny people, etc. The problem with getting there (and this is part of why data teams don’t get hired until later in the company lifecycle) is that the actual, cold hard data that you need to answer important questions typically lies all over the place. And it needs cleaning.

The process and discipline of gathering data from original sources, cleaning it, and getting into a warehouse is a tedious, ongoing process, and it’s a lot (most?) of what early data teams spend their time on.

Traditionally, there was a three step process for getting that done: you’d first extract the data from the source, then transform it in flight to clean and ready it for analysis, then load it into your data warehouse. Transformation was done in flight, because putting raw source data into the warehouse first wasn’t financially or technically feasible.

Today, though, as data warehouses have gotten easier to use, cheaper, and we’ve separated storage from compute, the paradigm is changing – companies are just funneling source data directly into their warehouses, and then working with it there. This is called ELT (because the transformation is happening after the load). And this is very important, because it makes data transformation as simple as writing SQL in your warehouse.

“Analysis” ready data

Source data – or in other words, what your production database, events, or even Stripe customers look like – is usually very different than the format you’d want for analysis. You might be capturing event data that looks like this:

What does dbt do?

The TL;DR

Terms Mentioned

Frontend

Open Source

SQL

Production database

Schema

Cloud

Framework

API

Analytics

Data warehouse

ETL

Machine Learning

Database

Query

Companies Mentioned

Snowflake

dbt Labs

Stripe

What’s a data model exactly?

🔮 Dependencies

The age of the simple warehouse

“Analysis” ready data

Access the full post in a knowledge base

Analyzing Software Companies

Working With Data Teams

Where to next?

What does Alteryx do?

What does Segment do?

What does Databricks do?