Selling storage and compute#
At its core, Snowflake is just a data warehouse as a service, and there are many of them (BigQuery, Redshift, etc.). We’ll run through the major components of it but just keep in mind that a very similar set of paragraphs could be written about BigQuery too.
When you get right to it, the “product” that Snowflake offers is really just a place to store your analytical data, and then query it. Here’s how they break down their product offering:
The separation of storage and compute#
One of the major innovations of modern cloud data warehouses is the separation of storage and compute. Recall that what makes a data warehouse unique is that it’s built for big ass, complex, long queries. As such, having those queries run on their own servers, servers separate from the servers that the data is stored on, can improve performance by quite a bit.
In Snowflake, storage is separated from compute:
- Storage: Snowflake takes care of how your data is stored: they optimize it, compress it, deal with metadata, etc. all behind the scenes
- Compute: Snowflake allows you to create any number of “virtual warehouses” to run your queries on
A “virtual warehouse” in Snowflake is basically a dedicated server for running analytical queries. You can create and configure any number of them right in the Snowflake UI:
It’s obviously a bit confusing that Snowflake chose to name these “warehouses” since Snowflake itself is a data warehouse, and for that their marketing team deserves a slap on the wrist. If you’re wondering what those credit things are, stay tuned.
In summary, the thing to keep in mind here is that your queries are running on different servers than your data is stored on. That allows Snowflake to:
- Optimize how your data is stored behind the scenes without worrying how they expose it to you (cheaper for them, faster for you)
- Scale up your query resources infinitely. Need to run a huge giant massive query? Just create a new virtual warehouse or two. Don’t need them anymore? Shut them down!
- Separate individual query performance, so that long running queries don’t affect the performance of other ones you’re running
In practice, not every Snowflake user will ever create multiple virtual warehouses, but the architecture here is what’s important.
Accessing your Snowflake data#
Snowflake gives you a lot of different ways to query and access the data you’ve got stored in their systems. Here are a few of the major ones. To reiterate, these aren’t necessarily differentiated from other warehouses.
1) Through the Snowflake UI#
Snowflake has a web interface that you can access through your browser. Among other things, it lets you create worksheets where you can write SQL against your warehouse. We’ll go more in depth on this UI in the next section.
If you’re using Python for Data Science, you can access your Snowflake data through their Python package. Like most database wrappers for programming languages, it’s not all that impressive: it’s mostly just a few functions for authentication and then executing queries. Here’s an example of a simple query:
conn = snowflake.connector.connect( ... )
cur = conn.cursor()
cur.execute('select * from products')
3) Through their command line interface (SnowSQL)#
Though this is apparently technically built on top of the Snowflake Python package, SnowSQL is Snowflake’s command line interface (CLI) for interacting with your Snowflake data. If you want to use your Terminal to write queries, this is how you’d do it.
4) Other connectors and drivers#
Outside of just the Python connector (probably the most developed option among connectors), Snowflake provides others for JavaScript, Go, .NET, PHP, Java, etc. The use cases for those programming languages are a bit more up in the air – it’s possible you might need to access Snowflake data as part of an operational Machine Learning model in more application focused languages like these, but I’m not sure.
If any of these specifics go over your head, just keep in mind that there are several different ways to actually access your Snowflake data, from good ‘ol SQL to native programming language packages.
Snowsight, the Snowflake UI#
One of the best parts of Snowflake vs. old data warehouses is how rich and powerful the web interface is. From snowflake.com you can log into your account and do so many things.
You can look through your data, down to specific tables and columns:
You can write SQL queries in a Snowflake feature called Worksheets:
Looking through and adjusting administrative information settings, like analyzing your usage, is simple:
The Snowflake marketplace#
One of the more interesting features of Snowflake is it's marketplace. It's a place where Snowflake users can find interesting datasets, data applications, and other data products that can be easily used in their Snowflake environment.
There are basically two main types of data in the marketplace:
- Public, free datasets – like lookup tables (finding zip codes and such), medical data, weather, public stuff like that.
- Proprietary connections with other tools – like connections to Hubspot and Mixpanel.
The public datasets are whatever, they’re not that hard to find and upload yourself. But the second category here is pretty interesting, and hints and Snowflake starting to make its way into the data moving business. The basic idea is that if you’re using something like Hubspot, they’re storing your marketing data in their internal databases; connecting it with Snowflake makes some carbon copies of that data in your warehouse, no configuration required.
Outside of the data marketplace, there’s a confusingly similar Snowflake feature called partner connect. Based on the documentation, it seems like partner connect is a way to get free trials of related software like dbt or Domo, while automatically integrating those accounts into Snowflake.
Snowflake's AI capabilities#
As I've always said, no company is immune from the overwhelming urge to put AI into their products. And Snowflake is no exception. In 2024, they launched Snowflake Intelligence: it lets business users create 'data agents' that can analyze their Snowflake data and take actions on it using natural language, without requiring technical knowledge or coding skills. TBD on how well these actually work. But it is interesting to note Snowflake's desire to reach a wider audience within organizations - not just data scientists and engineers, but also business analysts and other non-technical users who need insights from data.
TBD on how well these actually work. But it is interesting to note Snowflake’s desire to reach a wider audience within organizations - not just data scientists and engineers, but also business analysts and other non-technical users who need insights from data.
Snowflake pricing#
A quick, small note on how Snowflake prices, as it’s a bit of a topic in the community. In theory, one of the great benefits of a cloud data warehouse is that you only pay for what you actually use, instead of having to outlay millions of dollars in fixed costs on a farm of servers. Snowflake, though, charges you for a unit called a credit:
So what are you buying here exactly?
Each resource in Snowflake – be it storing data, querying it, or using the cloud services around it – costs you in terms of credits per hour. For example, if you’ve set up a virtual warehouse to query your data (recall: this is what Snowflake calls their servers used for compute), and you choose the “X-Small” size, this will cost you 1 credit per hour. If you’re on the standard plan, that comes out to $2/hr, which seems reasonable.
The credit model is somewhat controversial. Some would prefer that Snowflake just charge you directly for the resources you’re using, and price each one out in its own specific way.