↑ BACK TO TOP

Learning Tracks: Working With Data Teams

If you work with data teams as part of your day to day, you'll need a strong technical foundation. This learning track will break down what concepts and tools you'll need to understand to be a great partner to all different types of data teams. And impress your boss.

The basics

Whether you're working with analytics, data science, or ML, there are some important basics that all data work starts with. Nail these down and you'll be ready to get into more role-specific stuff.

🚨 What you need to know

  1. What do data teams even do? Start by reading about the basic jobs to be done for data teams.
  2. At SaaS companies, product analytics is a big part of what data teams do.
  3. You can read an overview of different parts of the data stack here.

🚧 What you should know

  1. The basic language of data teams is SQL, and it's very learnable.
  2. An important role of data teams is helping measure initiatives via experimentation.
  3. A new slew of tools build around cloud data warehouses are called The Modern Data Stack" but it's mostly a marketing gimmick.

Where data comes from

To get powerful models and nice dashboards, the data needs to come from somewhere - and it's usually a mish mosh of sources from around your business.

🚨 What you need to know

  1. Data for analytics comes from across your business: your user and app data, and third party tools like Stripe and Salesforce.
  2. Relational databases are the ABCs of backends: they're where you store the data your app needs, like your users and their settings.

🚧 What you should know

  1. NoSQL databases are another popular way to store data, with less structure and more flexibility.

Where data is stored

Once data teams have their source data in order, they usually store it in a special database designed specifically for analytics and data science.

🚨 What you need to know

  1. These days, most teams store their analytics data in a cloud-based data warehouse.

🚧 What you should know

  1. A popular but less organized storage format is called a Data Lake.

⌨️ Tools and products

  1. Snowflake is the most popular cloud data warehouse, and was the biggest tech IPO ever.
  2. Elastic is an analytics database specifically built for searching through unstructured data.
  3. MongoDB is a popular type of NoSQL database for applications.

How data gets moved around

Source data is rarely in the format data teams need it in, so they need to transform it into the right form and shape. This is sometimes done before moving it into the warehouse (ETL), and sometimes done after (ELT).

🚨 What you need to know

  1. Transforming data usually gets called ETL, short for extract, transform, and load.

⌨️ Tools and products

  1. dbt is an increasingly popular tool for transforming and organizing your warehouse data.
  2. Kafka is a powerful tool built at LinkedIn for streaming event data in real time.
  3. Segment helps data teams collect analytics events and send them to the tools they need to be in
  4. Databricks is a tool for running Spark jobs, basically ETL for big data.

How data gets used

Once cleaned, organized data is in the warehouse, you can do anything with it, from dashboards to operations to ML models.

🚨 What you need to know

  1. Reverse ETL is the process of getting data from the warehouse to tools like Salesforce and Hubspot.
  2. Most data teams use a special type of code notebook to explore and analyze their data.

🚧 What you should know

  1. A language-based ML model named GPT-3 took the world by storm.
  2. For anyone who has seen or used ChatGPT or DALL-E, ML and AI have been advancing quickly over the past few years.

⌨️ Tools and products

  1. Kafka is a popular tool for streaming event data in real time.
  2. Segment helps data teams collect analytics events and send them to the tools they need to be in
  3. Databricks is a tool for running Spark jobs, basically ETL for big data.

Data in Machine Learning and AI

With the rise of generative AI, chances are someone at your company is building or using models of some sort.

🚨 What you need to know

  1. Reverse ETL is the process of getting data from the warehouse to tools like Salesforce and Hubspot.
  2. Most data teams a tool called a Jupyter Notebook to explore and analyze their data.
    COMING SOON

🚧 What you should know

  1. A language-based ML model named GPT-3 took the world by storm.
  2. For anyone who has seen or used ChatGPT or DALL-E, ML and AI have been advancing quickly over the past few years.
  3. There are plenty of useful ML models that aren't made by OpenAI that you can use in your day to day.

⌨️ Tools and products

  1. OpenAI is the most popular provider of generative AI models like GPT-4 and DALL-E.
  2. Databricks is a tool for running Spark jobs, basically ETL for big data.

Technically learning tracks help make the world of software simple and digestible, so you can be better at your job. There are more on the way!

Ideas for other learning tracks? Ways we can improve this one? Let us know.