↑ BACK TO TOP
open sidebar menu
  • AI, it's not that complicated/The Generative AI wave
    Knowledge Bases
    Analyzing Software CompaniesBuilding Software ProductsAI, it's not that complicatedWorking With Data Teams
    Sections
    1: The Basics
    2: The Generative AI wave
    It was never about LLM performanceWhat is RAG?What's a vector database?How do AI models think and reason?How to build apps with AIWhat is MCP?What is Generative AI?The beginner’s guide to AI model architecturesA deep dive into MCP and its associated serversThe scaling law and the “bitter lesson” of AIA practical breakdown of the AI power situationThe vibe coder’s guide to real coding2026 vibe coding tool comparisonHow to build AI products that are actually goodThe AI user's guide to evalsAI will replace you at your job if you let itAI and neuroscienceAI and the — em dash
    3: Tools and Products
Sign In

AI and the — em dash

Finally, an explanation for why AI models can't seem to quit them.

ai

Published: January 29, 2026

This sentence — which I wrote from scratch without the help of AI — contains an em dash (actually two).

If you've been keeping up with the online discourse about AI writing, you may be surprised that I put an em dash in this post. That's because so many human writers are steering away from this once-common punctuation mark, which is now viewed as a hallmark of writing written by chatbots.

In fact, AI bots love the em dash so much that it can be hard to get them to write content without including it, even when you give the bot explicit instructions not to do so. LLMs can be so funny sometimes.

Of course, this raises the question of why em dashes are all over AI-written content — and whether human writers should give up this once-beloved punctuation mark entirely, so their content isn't immediately clocked as being written by an LLM.

Terms Mentioned

Training

LLM

Token

Companies Mentioned

OpenAI logo

OpenAI

$PRIVATE

Why does AI love the em-dash so much?

AI's love affair with em dashes seems to have a simple explanation: The data used to train large language models was full of em-dashes. The AI is simply mimicking the writers that it learned from.

In fact, there's some evidence to suggest that the content AI was trained on included significantly more em-dashes than you might expect. And weirdly enough, their prevalence seems to have become a deep bias that's embedded into how LLMs understand the flow and structure of writing.

AI-training material may have used an overabundance of em-dashes

One theory behind AI's love of the em-dash is that the later-generation AI models, which rely on it much more heavily than earlier iterations, were trained on older books that included more em-dashes than most modern writers would.

Early on, most AI models were trained based on a mix of public data on the Internet, as well as based on content from pirated books. However, in a quest for better quality training data as the tools evolved, AI models started scanning older texts. Curating the massive data trove that is the internet has been a major focus of AI Labs for more recent model generations, and finding quality text from books was certainly part of that.

The exact timeline for when this happened is something of a mystery, but Anthropic started in 2024, based on court documents, and other AI labs likely made a similar move somewhere between 2022 and 2024.

If AI labs digitized mostly older books, which is a common belief largely because of expired copyrights, their AI programs may have been fed writing with significantly more em dashes included in it — especially as studies show the use of that use of the em-dash peaked in the 1860s.

Loading image...

It may not have been the books alone, either.

Another theory suggests that AI may also have picked up em-dash use from Medium, which automatically converted two hyphens (--) into an em-dash since the company's founder was a fan of typography. Since Medium may have been seen as a source of high-quality writing by LLMs (and ergo upweighted in training by labs), AI may have determined that the em dash is a key feature of high-quality prose.

The brevity theory

Access the full post in a knowledge base

Knowledge bases give you everything you need – access to the right posts and a learning plan – to get up to speed on whatever your goal is.

Knowledge Base

AI, it's not that complicated

How to understand and work effectively with AI and ML models and products.

$0.00

What's a knowledge base? ↗

Where to next?

Keep learning how to understand and work effectively with AI and ML models and products.

Comparing available LLMs for non-technical users

How do ChatGPT, Mistral, Gemini, and Llama3 stack up for common tasks like generating sales emails?

Tools and Products
What does OpenAI do?

OpenAI is the most popular provider of generative AI models like GPT-4.

Tools and Products
Databricks is apparently worth $100B. What do they even do?

What we should really be asking is “What does Databricks not do?”

Tools and Products
Newsletter
Support
Sponsorships
X + Linkedin
Privacy + ToS

Written with 💔 by Justin in Brooklyn