← Back to Tracks

Learning Tracks: Working With Data Teams

If you work with data teams as part of your day to day, you'll need a strong technical foundation. This learning track will break down what concepts and tools you'll need to understand to be a great partner to all different types of data teams. And impress your boss.

The basics

Whether you're working with analytics, data science, or ML, there are some important basics that all data work starts with. Nail these down and you'll be ready to get into more role-specific stuff.

🚨 What you need to know

What do data teams even do? Start by reading about the basic jobs to be done for data teams.
At SaaS companies, product analytics is a big part of what data teams do.
You can read an overview of different parts of the data stack here.

🚧 What you should know

The basic language of data teams is SQL, and it's very learnable.
An important role of data teams is helping measure initiatives via experimentation.
A new slew of tools build around cloud data warehouses are called The Modern Data Stack" but it's mostly a marketing gimmick.

Where data comes from

To get powerful models and nice dashboards, the data needs to come from somewhere - and it's usually a mish mosh of sources from around your business.

🚨 What you need to know

Data for analytics comes from across your business: your user and app data, and third party tools like Stripe and Salesforce.
Relational databases are the ABCs of backends: they're where you store the data your app needs, like your users and their settings.

You can go more in depth on production databases.
PAID

🚧 What you should know

NoSQL databases are another popular way to store data, with less structure and more flexibility.

Where data is stored

Once data teams have their source data in order, they usually store it in a special database designed specifically for analytics and data science.

🚨 What you need to know

These days, most teams store their analytics data in a cloud-based data warehouse.

You can go more in depth on cloud data warehouses.
PAID

🚧 What you should know

A popular but less organized storage format is called a Data Lake.

⌨️ Tools and products

Snowflake is the most popular cloud data warehouse, and was the biggest tech IPO ever.
PAID
Elastic is an analytics database specifically built for searching through unstructured data.
PAID
MongoDB is a popular type of NoSQL database for applications.

How data gets moved around

Source data is rarely in the format data teams need it in, so they need to transform it into the right form and shape. This is sometimes done before moving it into the warehouse (ETL), and sometimes done after (ELT).

🚨 What you need to know

Transforming data usually gets called ETL, short for extract, transform, and load.

You can go more in depth on ETL.
PAID

⌨️ Tools and products

dbt is an increasingly popular tool for transforming and organizing your warehouse data.
PAID
Kafka is a powerful tool built at LinkedIn for streaming event data in real time.
Segment helps data teams collect analytics events and send them to the tools they need to be in
Databricks is a tool for running Spark jobs, basically ETL for big data.
PAID

How data gets used

Once cleaned, organized data is in the warehouse, you can do anything with it, from dashboards to operations to ML models.

🚨 What you need to know

Reverse ETL is the process of getting data from the warehouse to tools like Salesforce and Hubspot.
Most data teams use a special type of code notebook to explore and analyze their data.

🚧 What you should know

A language-based ML model named GPT-3 took the world by storm.
For anyone who has seen or used ChatGPT or DALL-E, ML and AI have been advancing quickly over the past few years.

⌨️ Tools and products

Kafka is a popular tool for streaming event data in real time.
Segment helps data teams collect analytics events and send them to the tools they need to be in
Databricks is a tool for running Spark jobs, basically ETL for big data.
PAID

Data in Machine Learning and AI

With the rise of generative AI, chances are someone at your company is building or using models of some sort.

🚨 What you need to know

Reverse ETL is the process of getting data from the warehouse to tools like Salesforce and Hubspot.
Most data teams a tool called a Jupyter Notebook to explore and analyze their data.
COMING SOON

🚧 What you should know

A language-based ML model named GPT-3 took the world by storm.

You can go more in depth on how these Large Language Models like ChatGPT actually work under the hood.
PAID

For anyone who has seen or used ChatGPT or DALL-E, ML and AI have been advancing quickly over the past few years.
There are plenty of useful ML models that aren't made by OpenAI that you can use in your day to day.

⌨️ Tools and products

OpenAI is the most popular provider of generative AI models like GPT-4 and DALL-E.
Databricks is a tool for running Spark jobs, basically ETL for big data.
PAID

Technically learning tracks help make the world of software simple and digestible, so you can be better at your job. There are more on the way!

Ideas for other learning tracks? Ways we can improve this one? Let us know.

Learning Tracks: Working With Data Teams

The basics

🚨 What you need to know

🚧 What you should know

Where data comes from

🚨 What you need to know

🚧 What you should know

Where data is stored

🚨 What you need to know

🚧 What you should know

⌨️ Tools and products

How data gets moved around

🚨 What you need to know

⌨️ Tools and products

How data gets used

🚨 What you need to know

🚧 What you should know

⌨️ Tools and products

Data in Machine Learning and AI

🚨 What you need to know

🚧 What you should know

⌨️ Tools and products

Support

Sponsorships

Twitter

Linkedin