The Details: Data Warehouses
A deep dive into data warehouses, what they do, and how different ones stack up.
Last updated: March 3, 2025
This post is going to dive deeper into one of Technically’s most requested topics: data warehouses. We’re going to look at what they’re used for, popular options (Snowflake , BigQuery, Redshift: what’s the difference?), how data lakes fit into the picture, and how vendors are positioning themselves as platforms going forward.
Refresher: what’s a Data Warehouse?
A data warehouse is a specially designed database that holds analytical data. It’s built to handle long, complicated queries written by data scientists, analysts, and machine learning engineers.
If you’re iffy on what data warehouses do, now would be a good time to read the original post here. It covers:
-
Why we use separate databases for our apps vs. analytics
-
How data gets into a warehouse via ETL or ELT
-
Typical types of analytical queries
And then come back here.
🚨 Confusion Alert
Part of where data warehouses get confusing is what they actually are. And the answer is that a data warehouse involves both:
- A different place to store your data (a separate database server)
- A different way of storing data (new data sources, structures, etc.)
Most companies with data teams will have at least two separate places where they work with data: their production database , and a data warehouse. The production database only has data that’s relevant to the core operations of their app (users, business concepts, etc.). The data warehouse, on the other hand, might have copies of their production data, payment data, website traffic data, and lots of other stuff - whatever the data team wants to analyze.
All good? Cool – now let’s dive into the details.
Terms Mentioned
Companies Mentioned
Different types of warehouses, home rolled to managed
There are basically 3 levels of data warehouse – and companies can be at any of these levels. The first two options are home rolled – meaning the company pays for and maintains their own infrastructure – while the third option is a managed service, and outsources infrastructure to someone like AWS .
1) Home rolled: basic SQL
The most basic type of data warehouse is just a vanilla setup of a typical SQL database like Postgres or MySQL. It might be sitting on a bigger server than your production database so it can handle more complex queries, but under the hood it’s the same technology.
What makes this a data warehouse, instead of just a regular database, is that (a) it’s more powerful if you put in on bigger servers, and (b) it has analytical data in it instead of just production app data.
🖇 Workplace Example
At one company I worked at, we had Segment automatically pipe out event data into Postgres tables, and queried them as our “warehouse” – even though it was just a basic Postgres database.
This setup is not very popular these days. An even rarer combination would be using a NoSQL database like MongoDB or DynamoDB.
2) Home rolled: big data
Option number 2 is setting up your own infrastructure and using more specialized, **analytics-focused open source ** like Hadoop. This is what we did at DigitalOcean before moving onto Snowflake (see section #3):
-
Our data was stored in HDFS (Hadoop), set up and running on our own servers
-
We used Apache Hive to query it using a SQL-like interface
-
Our queries ran on Presto, which distributed them across powerful servers
This was a decently complicated setup, but not uncommon. The benefit to rolling your own systems tends to be cost – we didn’t pay per query, and were able to store and query a lot of data. But the downside was maintenance: setting up and maintaining the warehouse was tough work, and there was constant downtime.
And that’s why we eventually migrated to Snowflake!
3) Managed services: Snowflake, BigQuery, Redshift
Finally, we arrive at what’s all the rage these days: cloud native, fully managed data warehouses. When you buy something like Snowflake, they take complete care of ...