The Excel User's Guide to Databases: Schemas
Why that schema change is going to take your engineers two weeks.
Last updated: March 3, 2025
Still wondering why your engineering lead thinks your small new feature requires a “schema change” that will take weeks? This post will break down what a schema is, why databases need them, and why they can be annoying to update.
Context: databases are about enforcing rigidity
Schemas are basically the entire reason that relational databases exist. So before we dive into what a schema is, let’s talk about why databases exist in the first place, and why your SaaS application can’t just use Excel, like you can.
Cloud applications need to persist data about their users: it’s how Gmail has your emails in your inbox, Twitter has your tweets, and Spotify has your playlists. The role of the database is to store that information as consistently and reliably as possible. Nobody is playing with that data, building formulas, looking at it, or making pivot tables: it’s powering applications and experiences.
🚨 Confusion Alert
There are plenty of other things that databases are used for outside of backing production applications. This post focuses on the use case you’ll most often encounter as a PM in tech, which is a SaaS application’s production database .
Applications interact with databases via a query language, which for most relational databases is SQL. When an application needs to read some data from the database to know your profile settings, it issues a SQL query. When it needs to insert some new data into the database because a new user signed up, it issues a SQL query. These queries need to run as fast as possible so the application is snappy and responsive.
Developers encode these rules into the application server, so the app knows how to interact with the database when something important happens. Here’s a sample flow that a developer might code into a web app:
- User navigates to their profile page
- Check if the user is logged in
- If yes, query the database for their profile information then display it on the page
- If no, redirect them to the login page and begin again from #2
The same “rulebooks” exist for anything that might happen: a new user signing up, someone deleting their account, a user updating their information, etc. Every interaction the application has with the database is carefully designed and specified.
If you’ve been paying attention, you’ll note how extremely different what I’ve just described is from how you use a spreadsheet in Excel. Even the most mission critical thousand formula spreadsheet is barely rigid at best; spreadsheets are designed for exploration and flexibility, and made to be used by humans.
So is a database a spreadsheet? Yes – but a very specific, very rigid version of one. And a schema is how that rigidity is enforced.
Schemas are what make databases all up tight
A schema is the sum of all of a database’s rules:
- Which tables exist and how they relate to each other
- Which columns are in which tables
- Which data type each column is
- When and where you can modify data
- …and other stuff too
A well designed schema means your database runs efficiently, queries are simple and straightforward to write, and they run quickly. A badly designed schema will do the opposite of those things.
Most of these concepts (tables, data types, relations) don’t really exist in spreadsheets, so they can be confusing. Let’s run through each and look at them through the lens of a spreadsheet.
For each of these sections, I’ll refer back to this sample data set I put together on Star Wars.
The first sheet is how I’d organize data in a spreadsheet: it’s kind of just there, with not a ton of regard for how it’s shaped. Since the use case is just looking through and maybe basic filtering, we don’t need to be that particular about formats. The second tab, though, takes that information and organizes it into what it would need to look like in a database.
The table
In a spreadsheet, you can put data anywhere. You can have several types of unrelated data in a single sheet, or separate them into multiple sheets. It really just doesn’t matter. In a database though, you need to be very particular about...