Everything is a pipeline.

Technical concepts are often more alike then you might think.

Last updated Jun 25, 2026devops

It describes how data is processed as it moves through the pipeline:

Bronze is data that’s been ingested, usually in the most raw state (eg logs or events).
Silver is the cleaned, joined, and standardized sets of data. This is where stuff like deduplication, filtering, and constraints are enforced.
Gold is ready to be used by the business, in reporting or machine learning models.

Loading image...

Each step of the pipeline the data gets more refined + usable, following a set of principles. You can walk into any company following this 3-stage pipeline architecture, look at their data transformation code, and be able to make sense (hopefully :) of what they’re doing.

The RAG Pipeline#

I once interviewed at a vector database company that had a pretty involved take-home assignment, that required me to explain a RAG (retrieval augmented generation) pipeline.

Coming from working in data, this was just one hop over, but it still felt foreign.

RAG systems let an LLM retrieve information from external datasets, and use that information to generate responses. RAG shows up every time an LLM like Claude does a web search. It’s so seamless we don’t even think about the underpinnings of what’s happening.

Notion is a great example. When you use Notion AI, queries are being embedded into vectors on the fly and it’s searching a vectorized version of your Notion workspace. You might be thinking “how the hell is it so fast?”. That’s part of the magic. This vector pipeline runs in the background (chunking, embedding, and storing in a vector db) so that Notion AI run your semantic (vector) search when it needs to.

Loading image...

During the ingestion stage, we add metadata to our documents (like title, author, etc) which helps add context to them. This will follow the documents through the rest of the pipeline. Chunking breaks the documents into similar-sized pieces, so that they’re faster to search through. Embedding is where we are transforming the text (or image / video other modal) into a vector. Next we package everything together and store it in a vector database.

This is where that metadata becomes super helpful, because it can be used for cheap + fast filtering at retrieval time.

Every pipeline requires tradeoffs. In the data pipeline example, a set of rules like the Medallion Architecture helps us decide where to put different types of data transformation work. In a RAG pipeline, the tradeoffs we’d think about are:

Chunking size: Do you chunk per token or per document? What happens when you need to re-chunk?
Embedding models: Do you use open source or self-hosted embedding models, or just use OpenAI? When do you switch to a new embedding model?
Latency: Search demands speed. If you have a B2C product and the search sucks, users may bail.

If you want to go deeper, I highly recommend Notion’s blog on vector search.

The more I learned about RAG pipelines, the more they look like the same pipeline problem.

Loading image...

The Sales Pipeline#

I work as an AE at OpenRouter, which means I manage a sales pipeline.

My sales pipeline looks eerily similar to just a data pipeline:

Bronze layer: leads in my CRM without any enrichment. They could be cold leads or warm leads, but I have no idea. Some are missing job titles, phone numbers or (worse) their name.
Silver layer: qualified leads. I know about them and they know about me too, how wonderful! I’m still qualifying them but at least they talk to me.
Gold layer: This is where the magic happens, where those companies and leads become opportunities. All that work to build a business case, it’s all paid off here in the gold layer. Now I can go and close them as customers!

I’ve simplified the hell out of this for the bit, but I think it fits.

So where do tradeoffs come into play here? It shows up in the small decisions I make every day. It shows up when I’m trying to protect my customer engineer’s time because the prospect sounds flaky. And it most certainly shows up when I need to forecast my sales for the quarter. We need to know whether a deal is moving forward or being left behind:

Loading image...

The Photography Pipeline#

Years ago I got into photography and started doing paid shoots on the side. I used to just take photos on my phone and then edit them in the VSCO app, but when I started doing paid shoots I upgraded to a full frame (Sony A7) camera with everything in RAW format.

Photo editing is also a pipeline. First, I’d take lots (hundreds) of photos, way more than I’d need. Then I’d edit in layers:

Bronze layer: Filter through them and make selections
Silver layer: play with exposures, shadow details, and more.
Gold layer: once the composition was right, I’d put the finishing touches on to get them delivery-ready.

Loading image...

What you See is Not What you Get#

I’ve always felt I had a strong knack for drawing patterns and connections but really I think it comes down to how I simplify my approach and layer in a framework. In this case I drew a lot of connections from looking at them as pipelines.

Pipelines have workflow stages, and there are inherent tradeoffs about where (in which stage) you put the work. The order of operations matters!

What’s fun about pipeline thinking is that you can break any process down into a pipeline, and play with where to draw lines between stages.

But a pipeline is just one mental model. What’s your favorite?

If LLMs went away tomorrow, what framework would you fall back on to describe how things work?

Everything is a Pipeline.#

We’re in a major cognitive dissonance era right now. AI is “moving so fast.” We all “feel behind.” So we use AI to deliver quick results (vibe coding, analyzing 30 page PDFs, Claude doing your taxes). But token maxxing leaves us feeling hollow, like Taco Bell after a night out. Because it stops us from taking a moment to actually think or understand anything. So let’s think for a second. Is everything technical just a pipeline? If you understand how one pipeline (like a data engineering pipeline) works, do you understand them all? Are all of these pipelines essentially the same?

Data engineering pipelines
RAG pipelines for AI products
Sales pipelines
Image processing pipelines
Frontend app compilation pipelines

Let’s explore. I promise this post will not result in you using more tokens.

The Data Pipeline#

Bronze, silver, and gold. Databricks’ marketing team uses this holy trinity to describe the medallion architecture for processing data.

Other companies like dbt have used staging, intermediate, and mart. I’ve even seen raw, cleaned, and curated. They all mean the same thing.

Everything is a pipeline.

Technical concepts are often more alike then you might think.

Last updated Jun 25, 2026devops

Afzal Jasani

It describes how data is processed as it moves through the pipeline:

Bronze is data that’s been ingested, usually in the most raw state (eg logs or events).
Silver is the cleaned, joined, and standardized sets of data. This is where stuff like deduplication, filtering, and constraints are enforced.
Gold is ready to be used by the business, in reporting or machine learning models.

Loading image...

The RAG Pipeline#

I once interviewed at a vector database company that had a pretty involved take-home assignment, that required me to explain a RAG (retrieval augmented generation) pipeline.

Coming from working in data, this was just one hop over, but it still felt foreign.

Loading image...

This is where that metadata becomes super helpful, because it can be used for cheap + fast filtering at retrieval time.

Chunking size: Do you chunk per token or per document? What happens when you need to re-chunk?
Embedding models: Do you use open source or self-hosted embedding models, or just use OpenAI? When do you switch to a new embedding model?
Latency: Search demands speed. If you have a B2C product and the search sucks, users may bail.

If you want to go deeper, I highly recommend Notion’s blog on vector search.

The more I learned about RAG pipelines, the more they look like the same pipeline problem.

Loading image...

The Sales Pipeline#

I work as an AE at OpenRouter, which means I manage a sales pipeline.

My sales pipeline looks eerily similar to just a data pipeline:

Bronze layer: leads in my CRM without any enrichment. They could be cold leads or warm leads, but I have no idea. Some are missing job titles, phone numbers or (worse) their name.
Silver layer: qualified leads. I know about them and they know about me too, how wonderful! I’m still qualifying them but at least they talk to me.
Gold layer: This is where the magic happens, where those companies and leads become opportunities. All that work to build a business case, it’s all paid off here in the gold layer. Now I can go and close them as customers!

I’ve simplified the hell out of this for the bit, but I think it fits.

Loading image...

The Photography Pipeline#

Photo editing is also a pipeline. First, I’d take lots (hundreds) of photos, way more than I’d need. Then I’d edit in layers:

Bronze layer: Filter through them and make selections
Silver layer: play with exposures, shadow details, and more.
Gold layer: once the composition was right, I’d put the finishing touches on to get them delivery-ready.

Loading image...

What you See is Not What you Get#

Pipelines have workflow stages, and there are inherent tradeoffs about where (in which stage) you put the work. The order of operations matters!

What’s fun about pipeline thinking is that you can break any process down into a pipeline, and play with where to draw lines between stages.

But a pipeline is just one mental model. What’s your favorite?

If LLMs went away tomorrow, what framework would you fall back on to describe how things work?

Everything is a Pipeline.#

Data engineering pipelines
RAG pipelines for AI products
Sales pipelines
Image processing pipelines
Frontend app compilation pipelines

Let’s explore. I promise this post will not result in you using more tokens.

The Data Pipeline#

Bronze, silver, and gold. Databricks’ marketing team uses this holy trinity to describe the medallion architecture for processing data.

Other companies like dbt have used staging, intermediate, and mart. I’ve even seen raw, cleaned, and curated. They all mean the same thing.

Explore learning tracks

Everything is a pipeline.

The RAG Pipeline#

The Sales Pipeline#

The Photography Pipeline#

What you See is Not What you Get#

Everything is a Pipeline.#

The Data Pipeline#

All about Infrastructure as Code

What's documentation?

Developers hate this one thing (all about code reviews)

Explore learning tracks

Everything is a pipeline.

The RAG Pipeline#

The Sales Pipeline#

The Photography Pipeline#

What you See is Not What you Get#

Everything is a Pipeline.#

The Data Pipeline#

All about Infrastructure as Code

What's documentation?

Developers hate this one thing (all about code reviews)