Comparing available LLMs for non-technical users

How do ChatGPT, Mistral, Gemini, and Llama3 stack up for common tasks like generating sales emails?

Last updated: July 4, 2025

If you’re using Large Language Models at work to help automate your job, or even just for personal use, you’ve now got the choice of several different models instead of just ChatGPT. Between Gemini (formerly Bard), Mixtral, Llama2, and ole’ reliable (ChatGPT), which is the best for the kinds of tasks you need it to do?

Existing benchmarks tend to focus on the technical aspects of these models – how fast do they run, how much context can they keep in mind, etc. These are interesting, but not very useful to typical readers of Technically. Instead, the focus of this post is how well they perform for real world tasks that functional teams like marketing, product, and operations would actually use them for.

The TL;DR, for impatient readers:

Most models are roughly at parity with each other for common chat-oriented tasks
ChatGPT performed significantly worse than I thought it would
Gemini, to the surprise of everyone, was the best performing model by a decent margin
Overall, model responses were usable, but would need a lot of cleanup and work to use practically

The ringer we shall put these models through

I designed 3 use cases to test each model against, designed to mimic a real world task that you might have an LLM do for in the course of your job. They’re all centered around generating text, even though some of these models are multimodal (can do images as well).

Terms Mentioned

Frontend

Open Source

Microservices

Cloud

Backend

API

ChatGPT

DevOps

Database

Companies Mentioned

OpenAI

$PRIVATE

Twilio

$TWLO

1) Generating social posts from an existing piece of content

This one is for marketing teams. A common (frankly tedious) task is taking an existing piece of content – say a blog post written recently – and breaking it down into smaller bits to post on social media like X or LinkedIn. For this test, I’ll ask the model to turn this Technically post about microservices into social bits. Here’s how I’ll evaluate results:

Does the model faithfully reproduce the important points of the original piece of content?
Does the generated content __flow __and make sense? Is it easy to read?
Does the model follow the given constraints for the formatting of the post? E.g. tweets less than 280 characters, new lines in LinkedIn posts

2) Synthesizing customer interview notes into an internal update

This one is for product and design teams. For this test, I’ll provide the model with some bullet points from a call I did with a customer (or potential customer) and how they use Technically. I’ll ask the model to generate a full form text update that I can share with my team and add into some internal documentation. Here’s how I’ll evaluate results:

Comparing available LLMs for non-technical users

How do ChatGPT, Mistral, Gemini, and Llama3 stack up for common tasks like generating sales emails?

Last updated: July 4, 2025

The TL;DR, for impatient readers:

Most models are roughly at parity with each other for common chat-oriented tasks
ChatGPT performed significantly worse than I thought it would
Gemini, to the surprise of everyone, was the best performing model by a decent margin
Overall, model responses were usable, but would need a lot of cleanup and work to use practically

The ringer we shall put these models through

Companies Mentioned

OpenAI

$PRIVATE

Twilio

$TWLO

1) Generating social posts from an existing piece of content

Does the model faithfully reproduce the important points of the original piece of content?
Does the generated content __flow __and make sense? Is it easy to read?
Does the model follow the given constraints for the formatting of the post? E.g. tweets less than 280 characters, new lines in LinkedIn posts

Comparing available LLMs for non-technical users

The ringer we shall put these models through

Terms Mentioned

Frontend

Open Source

Microservices

Cloud

Backend

API

ChatGPT

DevOps

Database

Companies Mentioned

OpenAI

Twilio

1) Generating social posts from an existing piece of content

2) Synthesizing customer interview notes into an internal update

Access the full post in a knowledge base

AI, it's not that complicated

Where to next?

What does OpenAI do?

What does Databricks do?

How can AI use websites?

Comparing available LLMs for non-technical users

The ringer we shall put these models through

Terms Mentioned

Frontend

Open Source

Microservices

Cloud

Backend

API

ChatGPT

DevOps

Database

Companies Mentioned

OpenAI

Twilio

1) Generating social posts from an existing piece of content

2) Synthesizing customer interview notes into an internal update

Access the full post in a knowledge base

AI, it's not that complicated

Where to next?

What does OpenAI do?

What does Databricks do?

How can AI use websites?