What does dbt do?
dbt (no capitals) is a tool for transforming and organizing data in your warehouse.
Last updated: March 3, 2025
The TL;DR
dbt (no capitals) is a tool for transforming and organizing data in your warehouse. It helps data teams get raw data ready for analysis and impact.
-
Data models are the core of effective data teams – they map business concepts onto cleaned, organized data
-
dbt helps data teams use SQL to build useful, documented data models that the rest of the company can benefit from
-
Core concepts in dbt: models, docs, seeds, and runs
-
dbt isn’t quite like anything on the market, and they’ve partnered with tools across the spectrum
The open source dbt product has seen almost fanatical levels of support from the engineering and data community; they also recently raised a $150M Series C.
Terms Mentioned
Companies Mentioned
What’s a data model exactly?
🔮 Dependencies
Understanding dbt will be a lot easier if you get comfortable with what ETL is and what teams use for it . You’ll also want to be familiar with the concept of a data warehouse .
dbt is quite simply a tool for building data models. If you’re on a data team, you probably know what that means. But alas, my dear audience, if you’re not, it may be unfamiliar. You’ve heard of machine learning models, but what’s a data model?
The age of the simple warehouse
First, the fundamentals. In a previous post, we talked about data integration:
Every company has this idealized vision of a data science and analytics team, with full visibility into how the business is doing, how the product gets used, how experiments are performing, super good looking and funny people, etc. The problem with getting there (and this is part of why data teams don’t get hired until later in the company lifecycle) is that the actual, cold hard data that you need to answer important questions typically lies all over the place. And it needs cleaning.
The process and discipline of gathering data from original sources, cleaning it, and getting into a warehouse is a tedious, ongoing process, and it’s a lot (most?) of what early data teams spend their time on.
Traditionally, there was a three step process for getting that done: you’d first extract the data from the source, then transform it in flight to clean and ready it for analysis, then load it into your data warehouse. Transformation was done in flight, because putting raw source data into the warehouse first wasn’t financially or technically feasible.
Today, though, as data warehouses have gotten easier to use, cheaper, and we’ve separated storage from compute, the paradigm is changing – companies are just funneling source data directly into their warehouses, and then working with it there. This is called ELT (because the transformation is happening after the load). And this is very important, because it makes data transformation as simple as writing SQL in your warehouse.
“Analysis” ready data
Source data – or in other words, what your production database , events, or even Stripe customers look like – is usually very different than the format you’d want for analysis. You mi...