What's Kafka and what does Confluent do?

Apache Kafka is a framework for streaming real time data, and Confluent offers Kafka as a managed service.

data flow diagram *avoid bugs like Gregor by using Kafka *

The TL;DR

Apache Kafka is a framework for streaming data between internal systems, and Confluent offers Kafka as a managed service.

  • We’re dealing with a lot of data these days – Big Data™ – and recording, storing, and moving it around is hard and expensive
  • Kafka helps stream that data throughout your company and distribute it to the systems that want to use it
  • The Kafka architecture works through a publish-subscribe pattern
  • Kafka 101 terminology: producers, consumers, messages, and topics

Kafka is new, but it’s getting pretty popular: managed service provider Confluent, founded by the original creators of Kafka, filed their S-1 last week.

The core Confluent product: data streaming

Kafka, and thus Confluent, exists to solve two fundamental problems facing almost every data infrastructure team at every company.

  1. There’s a lot of data, and it’s all happening very quickly

As storage has gotten cheaper, we’ve been collecting more and more data. Most software companies record every single website visit and click, and some go even deeper. Once you have more than few users interacting with your product, you’re talking about millions of different events per day. Storing and managing that size and velocity of data is hard.

  1. Data needs to move around to be valuable

Even if you’re a wiz at collecting and storing your data, there’s a problem: you’re going to need to move it around for it to be valuable. Where data gets initially collected and stored is rarely where it’s going to be useful.

Kafka solves these problems by creating a central registry for all of this data – you can think of it like one of those conveyor belt sushi places. Any consumers that need to use the data (like apps, databases, or ML models) can just take the plate they need (although really, they’re just taking a copy of it). This is sometimes called a publish-subscribe model, often shortened to pub-sub.

I’ve got two problems, that’s it

Kafka exists to solve two fundamental problems facing almost every data infrastructure team at every company. 

1) There’s a lot of data, and it’s all happening very quickly

As st...