What’s a data model exactly?
Understanding dbt will be easier if you get comfortable with and . You’ll also want to be familiar with the concept of a .
dbt is quite simply a tool for building data models. If you’re on a data team, you probably know what that means. But alas, my dear audience, if you’re not, it may be unfamiliar. You’ve heard of machine learning models, but what’s a data model?
The age of the simple warehouse
First, the fundamentals. In a previous post, we talked about data integration:
Every company has this idealized vision of a data science and analytics team, with full visibility into how the business is doing, how the product gets used, how experiments are performing, super good looking and funny people, etc. The problem with getting there (and this is part of why data teams don’t get hired until later in the company lifecycle) is that the actual, cold hard data that you need to answer important questions typically lies all over the place. And it needs cleaning.
The process and discipline of gathering data from original sources, cleaning it, and getting into a warehouse is a tedious, ongoing process, and it’s a lot (most?) of what early data teams spend their time on.
Traditionally, there was a three step process for getting that done: you’d first extract the data from the source, then transform it in flight to clean and ready it for analysis, then load it into your data warehouse. Transformation was done in flight, because putting raw source data into the warehouse first wasn’t financially or technically feasible.
Today, though, as data warehouses have gotten easier to use, cheaper, and we’ve separated storage from compute, the paradigm is changing – companies are just funneling source data directly into their warehouses, and then working with it there. This is called ELT (because the transformation is happening after the load). And this is very important, because it makes data transformation as simple as writing SQL in your warehouse.