↑ BACK TO TOP
open sidebar menu
  • AI, it's not that complicated/Tools and Products
    Knowledge Bases
    Analyzing Software CompaniesBuilding Software ProductsAI, it's not that complicatedWorking With Data Teams
    Sections
    1: The Basics
    2: The Generative AI wave
    3: Tools and Products
    Comparing available LLMs for non-technical usersWhat does OpenAI do?What does Databricks do?How can AI use websites?5 models that aren't ChatGPT and what you can use them for
Sign In

Comparing available LLMs for non-technical users

How do ChatGPT, Mistral, Gemini, and Llama3 stack up for common tasks like generating sales emails?

ai

Last updated: July 4, 2025

If you’re using Large Language Models at work to help automate your job, or even just for personal use, you’ve now got the choice of several different models instead of just ChatGPT. Between Gemini (formerly Bard), Mixtral, Llama2, and ole’ reliable (ChatGPT), which is the best for the kinds of tasks you need it to do?

Existing benchmarks tend to focus on the technical aspects of these models – how fast do they run, how much context can they keep in mind, etc. These are interesting, but not very useful to typical readers of Technically. Instead, the focus of this post is how well they perform for real world tasks that functional teams like marketing, product, and operations would actually use them for.

The TL;DR, for impatient readers:

  • Most models are roughly at parity with each other for common chat-oriented tasks
  • ChatGPT performed significantly worse than I thought it would
  • Gemini, to the surprise of everyone, was the best performing model by a decent margin
  • Overall, model responses were usable, but would need a lot of cleanup and work to use practically

The ringer we shall put these models through

I designed 3 use cases to test each model against, designed to mimic a real world task that you might have an LLM do for in the course of your job. They’re all centered around generating text, even though some of these models are multimodal (can do images as well).

Terms Mentioned

Frontend

Open Source

Microservices

Cloud

Backend

API

ChatGPT

DevOps

Database

Companies Mentioned

OpenAI logo

OpenAI

$PRIVATE
Twilio logo

Twilio

$TWLO

1) Generating social posts from an existing piece of content

This one is for marketing teams. A common (frankly tedious) task is taking an existing piece of content – say a blog post written recently – and breaking it down into smaller bits to post on social media like X or LinkedIn. For this test, I’ll ask the model to turn this Technically post about microservices into social bits. Here’s how I’ll evaluate results:

  • Does the model faithfully reproduce the important points of the original piece of content?
  • Does the generated content __flow __and make sense? Is it easy to read?
  • Does the model follow the given constraints for the formatting of the post? E.g. tweets less than 280 characters, new lines in LinkedIn posts

2) Synthesizing customer interview notes into an internal update

This one is for product and design teams. For this test, I’ll provide the model with some bullet points from a call I did with a customer (or potential customer) and how they use Technically. I’ll ask the model to generate a full form text update that I can share with my team and add into some internal documentation. Here’s how I’ll evaluate results:

Access the full post in a knowledge base

Knowledge bases give you everything you need – access to the right posts and a learning plan – to get up to speed on whatever your goal is.

Knowledge Base

AI, it's not that complicated

How to understand and work effectively with AI and ML models and products.

$0.00

What's a knowledge base? ↗

Where to next?

Keep learning how to understand and work effectively with AI and ML models and products.

What does OpenAI do?

OpenAI is the most popular provider of generative AI models like GPT-4.

Tools and Products
What does Databricks do?

Databricks sells a data science and analytics platform built on top of an open source package called Apache Spark.

Tools and Products
How can AI use websites?

How Browserbase built their viral demo of Open Operator (in 24 hrs!) using Vercel's v0 and the AI SDK.

Tools and Products
Support
Sponsorships
Twitter
Linkedin
Privacy + ToS