Knowledge Base: Working With Data Teams

You'll learn: How to make positive contributions when working with analytics and data science teams

Knowledge Base information is split up into 4 sections, organized by how it important it is to know each bit; peruse at your own leisure.

Pro tip: bookmark any post by clicking the bookmark icon on the right, and you'll be able to see it in your account's reading list.

The basics

Core concepts everyone should understand about data teams, including analytics, data science, and ML

What you need to know

What you need to know

What you should know

What you should know

Where data comes from

Understanding data sources and collection across the business

What you need to know

What you need to know

  • Data for analytics comes from across your business: your user and app data, and third party tools like Stripe and Salesforce.
  • Relational databases are the ABCs of backends: they're where you store the data your app needs.
  • You can go more in depth on production databases.
What you should know

What you should know

  • NoSQL databases are another popular way to store data, with less structure and more flexibility.

Where data is stored

Understanding data warehouses and storage solutions

What you need to know

What you need to know

What you should know

What you should know

  • A popular but less organized storage format is called a Data Lake.
Tools and products

Tools and products

  • Snowflake is the most popular cloud data warehouse.
  • Elastic is an analytics database specifically built for searching through unstructured data.
  • MongoDB is a popular type of NoSQL database for applications.

How data gets moved around

Understanding ETL processes and data transformation

What you need to know

What you need to know

  • Transforming data usually gets called ETL, short for extract, transform, and load.
  • You can go more in depth on ETL.
Tools and products

Tools and products

  • dbt is an increasingly popular tool for transforming and organizing your warehouse data.
  • Kafka is a powerful tool built at LinkedIn for streaming event data in real time.
  • Segment helps data teams collect analytics events and send them to the tools they need to be in.
  • Databricks is a tool for running Spark jobs, basically ETL for big data.

How data gets used

Understanding data applications and machine learning

What you need to know

What you need to know

What you should know

What you should know

  • A language-based ML model named GPT-3 took the world by storm.
What's nice to know

What's nice to know