Technically
AI Reference
Your dictionary for AI terms like LLM and RLHF
Company Breakdowns
What technical products actually do and why the companies that make them are valuable
Learning Tracks
In-depth, networked guides to learning specific concepts
Posts Archive
All Technically posts on software concepts since the dawn of time
Terms Universe
The dictionary of software terms you've always wanted

Explore learning tracks

AI, it's not that ComplicatedAnalyzing Software CompaniesBuilding Software ProductsWorking with Data Teams
Loading...
I'm feeling luckyPricing
Log In

Comparing available LLMs for non-technical users

How do ChatGPT, Mistral, Gemini, and Llama3 stack up for common tasks like generating sales emails?

Last updated Jul 4, 2025ai
Justin Gage
Justin Gage
Read within learning track:AI, it's not that complicated

If you’re using Large Language Models at work to help automate your job, or even just for personal use, you’ve now got the choice of several different models instead of just ChatGPT. Between Gemini (formerly Bard), Mixtral, Llama2, and ole’ reliable (ChatGPT), which is the best for the kinds of tasks you need it to do?

Existing benchmarks tend to focus on the technical aspects of these models – how fast do they run, how much context can they keep in mind, etc. These are interesting, but not very useful to typical readers of Technically. Instead, the focus of this post is how well they perform for real world tasks that functional teams like marketing, product, and operations would actually use them for.

The TL;DR, for impatient readers:

  • Most models are roughly at parity with each other for common chat-oriented tasks
  • ChatGPT performed significantly worse than I thought it would
  • Gemini, to the surprise of everyone, was the best performing model by a decent margin
  • Overall, model responses were usable, but would need a lot of cleanup and work to use practically

The ringer we shall put these models through

I designed 3 use cases to test each model against, designed to mimic a real world task that you might have an LLM do for in the course of your job. They’re all centered around generating text, even though some of these models are multimodal (can do images as well).

Terms Mentioned

Frontend

Open Source

Microservices

Cloud

Backend

API

Fine Tuning

ChatGPT

DevOps

Database

Companies Mentioned

OpenAI logo

OpenAI

PRIVATE
Twilio logo

Twilio

TWLO

1) Generating social posts from an existing piece of content

This one is for marketing teams. A common (frankly tedious) task is taking an existing piece of content – say a blog post written recently – and breaking it down into smaller bits to post on social media like X or LinkedIn. For this test, I’ll ask the model to turn this Technically post about microservices into social bits. Here’s how I’ll evaluate results:

  • Does the model faithfully reproduce the important points of the original piece of content?
  • Does the generated content __flow __and make sense? Is it easy to read?
  • Does the model follow the given constraints for the formatting of the post? E.g. tweets less than 280 characters, new lines in LinkedIn posts

2) Synthesizing customer interview notes into an internal update

This one is for product and design teams. For this test, I’ll provide the model with some bullet points from a call I did with a customer (or potential customer) and how they use Technically. I’ll ask the model to generate a full form text update that I can share with my team and add into some internal documentation. Here’s how I’ll evaluate results:

Continue reading with an all-access subscription

Continue reading with all-access

In this post

  • 2) Synthesizing customer interview notes into an internal update
  • 3) Generating a personalized outbound sales email
  • LLM test: Gemini (formerly Bard)
  • Gemini test 1: generating social posts
  • Gemini test 2: synthesizing customer interview notes

More in this track

How do Large Language Models work?

Breaking down what ChatGPT and others are doing under the hood

What's GPT-3?

GPT-3 is a Machine Learning model that generates text.

$15/month

30-day money-back guarantee

Or use
Up Next
What does OpenAI do?Paid Plan

OpenAI is the most popular provider of generative AI models like GPT-4.

Databricks is apparently worth $100B. What do they even do?Paid Plan

What we should really be asking is “What does Databricks not do?”

How can AI use websites?Paid Plan

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Content
  • All Posts
  • Learning Tracks
  • AI Reference
  • Companies
  • Terms Universe
Company
  • Pricing
  • Sponsorships
  • Contact
Connect
SubscribeSubstackYouTubeXLinkedIn
Legal
  • Privacy Policy
  • Terms of Service

© 2026 Technically.