Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

What’s an inference provider?

How the rise of open source AI models is fueling the growth of a new infrastructure category.

Last updated Apr 16, 2026ai
Will Raphaelson
Will Raphaelson
Read within learning track:

Amidst the frenzy of AI, a new category of infrastructure vendor has emerged: The inference provider. This white-hot newish(ish) vertical is producing some pretty big headlines for a category that, as we’ll see, is often overshadowed by big AI labs like OpenAI and Anthropic. TogetherAI is in talks to raise at $7.5B, Fireworks raised at $4B last year, and Modal is in talks to raise at $2.5B – all of these are doubling and doubling every few months it would seem. NVIDIA splashed $20 billion for Groq’s inference chip technology in December, its largest acquisition to date.

So yes, inference is white hot. But what do these companies actually do? And why are they worth so much money?

Let’s dig in, shall we?

Terms Mentioned

Benchmark

Open Source

LLM

IaaS

Rate Limit

Server

Cloud

Infrastructure

Integration

Networking

API

Metric

ChatGPT

Deploy

Endpoint

Cache

Machine Learning

API calls

Scaling

Inference

Token

Companies Mentioned

OpenAI logo

OpenAI

PRIVATE
AWS logo

AWS

AMZN

What’s inference?

AI didn’t take long to become totally ubiquitous. It answers your questions, summarizes your meeting notes, codes up your apps, and, if you're a bad friend, writes your birthday cards. All of it, every AI product you touch, boils down to two things: training and inference.

Training is how a model learns. You take a massive amount of text and use it to produce a mathematical equation that captures patterns in language: what words tend to follow what, how ideas relate to each other, what a reasonable response to a question looks like. Models like GPT and Claude were trained on enormous amounts of text from the internet, and what emerged are systems that can produce generally fluent, useful language.

Loading image...

Inference is when the model actually does its job. Every time you send a message to ChatGPT and it responds, that's inference. Every time Claude summarizes a document or an AI coding assistant autocompletes a function, that's inference too. It's the model taking what it learned during training and applying it to your input, in real time.

Loading image...

Training models has historically been the sexy part of AI, done by people in glasses, wearing turtlenecks, and holding PhDs. They are paid many dollars. Serving models, on the other hand is plumbing, but as it turns out, as every company races to add AI to their app, the plumbers are doing pretty well for themselves. Dedicated inference providers have emerged as a major force in the AI application stack, offering a convenient managed inference service for developers building AI-powered features.

In its purest and most managed form, the OpenAI and Anthropic (we’ll call them frontier labs) APIs are AI inference providers. They have both trained the models and set them up for you to use easily. You make an API call: (“write my wedding vows”) and get back an inference: (“Webster’s dictionary defines love as…”). So what are we talking about here? Why would you use something special?

Why use an inference provider?

As someone using AI or building into your app, you have two major choices. You can use a closed-source model like ChatGPT or Claude. These are proprietary models – nobody knows how they were trained – and the only way to use them is through the people who trained them, OpenAI and Anthropic. These labs (as well as other closed-source providers like Runway, Midjourney, and Google) are very good at training models, and generally their models will be the best on the planet, but also cost the most.

Then there are open source models. These are built by companies or the community, and released for all to use. The weights of these models (the key to the fancy prediction equation) and sometimes even the training code are open to the public. You can download and run them yourself or even tweak them if you’d like. The most popular ones are Llama from Meta, Qwen from Alibaba, and DeepSeek.

With open-source models, you have a choice in how you actually use them. You can self-host by spinning up your own infrastructure, loading the model weights, and managing everything yourself. Or you can instead use an inference provider: a company that hosts open-source models for you and lets you access them through an API, just like you would with a closed-source lab.

There are a few reasons a developer might use an inference provider over just calling the inference endpoints of the frontier labs directly, or running OSS models themselves.

Cost. The newest Frontier models are expensive. If your use case doesn't actually need a state of the art model like GPT-5.4 or Claude Opus, you can run an open-weights model like LLaMA on an inference provider for a fraction of the price. For high-volume, straightforward tasks like summarization, classification, or extraction, you probably don't need the best model on the market, and the savings add up fast.

Speed. The big frontier labs generally optimize for model capability over latency. If you're building something where response time matters (autocomplete, real-time agents), inference providers running on optimized hardware can get you significantly faster performance.

Resilience at scale. If you're big enough that OpenAI having an outage means your product has an outage, you have a problem. Inference providers let you route across multiple models and backends through a single API, so you're not in hell it every time one provider has a bad day.

Native integration with your cloud. The public cloud inference provider offerings like AWS Bedrock or Google VertexAI have an advantage in the enterprise because their models and endpoints are colocated with your other infrastructure, which can matter to security and data-residency-conscious buyers. They also tend to offer the most straightforward paths to fine-tuning and customization, which is when you take a base model and train it further on your own data so it performs better for your specific use case.

All this, at least for me, actually reverses the question - why would you call the frontier labs’ APIs when the inference providers are cheaper and faster? There are at least a few reasons:

  • You need the latest and greatest models (remember, the inference providers are often hosting the open weights models, which are older)
  • Cost and/or latency are not a concern for you (congrats, happy for you)
  • You’re a noob and didn’t know about inference providers (me, until recently)

The inference provider spectrum

Like many software categories, there is a spectrum from:

Continue reading with an all-access subscription

Continue reading with all-access

In this post

  • The inference provider spectrum
  • Most Managed: First-Party Model APIs
  • Cloud Hyperscalers: Enterprise AI Platforms
  • Middle of the Road: Inference-as-a-Service
  • Least Managed: GPU Cloud Providers

More in this track

What is Machine Learning?

How computers learn patterns from data: the foundation for everything from stock price prediction to ChatGPT.

How do Large Language Models work?

Breaking down what ChatGPT and others are doing under the hood

$15/month

30-day money-back guarantee

Or use
Up Next
How do Large Language Models work?Paid Plan

Breaking down what ChatGPT and others are doing under the hood

What is Generative AI?Paid Plan

A refresher on how models like ChatGPT and Stable Diffusion work under the hood.

The post about GPUsPaid Plan

Why these chips pair perfectly with AI, and how NVIDIA became the most valuable company in the world.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.