Does ChatGPT Have a Place in Content Production?

Niki Lancaster

Head of Creative

Content Marketing

There has been much talk about ChatGPT over the past few months, and one of the hottest topics is using ChatGPT for copywriting. So, we decided to test it out to understand its strengths and limitations and better understand if and how we could utilize ChatGPT to enhance the content production process.


Our experiments took the following steps:

  1. Assemble a list of all possible content types and variables to test
  2. Create four different versions of each content type:
    • Version one: Created by a human
    • Version two: Created by ChatGPT at temp 0 (ChatGPT temps range from 0 to 100; the tool claims that the numbers indicate a sliding scale of predictability, so outputs at zero are considered ‘predictable’ whereas outputs at 100 are considered more ‘random.’)
    • Version three: Created by ChatGPT at temp 50
    • Version four: Created by ChatGPT at temp 100
  3. Human review the examples created by ChatGPT, including blind testing.
    • Within our initial working group, we reviewed each piece and collected feedback on the quality.
    • We then contacted a larger group outside the experiment to gather more feedback to assess whether any AI-generated content was of a high enough quality to go live on our clients’ websites
  4. Use AI tools to review the examples created by ChatGPT:
    • We ran the content through Grammarly, an automated copy checker we use as a standard part of our QA process
    • We ran the content through Originality to see if any content would have been classified as plagiarised or would be flagged as written by AI.




Selecting the content types to test


The first step was to form a working group and nail down exactly what we wanted to test. Simply testing ChatGPT on all copywriting is too broad, as the briefs we work on can be relatively niche. So, we collated a list of over 18 types of content we are asked to produce: company news blogs, press releases for campaigns, product page copy for websites, interviews, product buying guides, and category page content. We then factored in the nuances of writing B2B and B2C copy and ensured our list contained variants of both when relevant to test any nuances.


Creating the content


Once we had the list of content types, we collated the human-generated content and then built prompts to instruct ChatGPT based on the briefs we typically receive to help replicate an AI version.

Once these were ready, we ran each prompt through ChatGPT three times: once on temp 0, once on temp 50, and finally on temp 100 to help understand the different outputs produced by the different temp settings.

This gave us four variations of the content to test: one created by humans and three created by ChatGPT at the three different temps (0, 50, and 100).


Humans vs AI: Test #1

As a working group, we reviewed the initial outputs of ChatGPT and made some notes around the quality of its outputs that covered:

  • Spelling and grammar
  • Structure
  • Readability
  • Whether the piece matched the client’s brand and tone of voice
  • The information included
  • How well the piece hit the brief
  • Factual accuracy

This gave us a starting point for whether we would be happy to deliver this content to clients and broader teams.


Humans vs AI: Test #2

Because we knew which content was written by humans and which was written by AI, and because we hold a bias (we’re part of the content team, so we will always advocate for human-written content), we then blind-tested the content with a broader group of people who weren’t part of the working group and who weren’t aware which content was AI and which was written by humans. These people were selected to review the content based on their knowledge of our clients and their involvement as stakeholders in our usual content sign-off process. These people would essentially be gatekeepers for getting our content live or sent to clients.

In addition to assessing the quality of the content, this group also considered whether they’d use the content they’d been sent (either by putting it live or sending it over for sign-off from a client).

Humans vs AI: Results

Although surprised by how quickly and legibly written the AI-generated content was, the people in the testing group easily identified it as AI-generated.

What people liked about ChatGPT and its outputs was the speed at which it delivered information. This made it helpful in structuring content, doing light research for ideas and tips, and adding ideas that previously may not have been considered. For example, it suggested adding warranty information to a product page.

Despite this, the flaws with ChatGPT were evident, and everyone in the testing group flagged the following flaws:

  • It struggled to stick to a strict word count and would often go over
  • It was not the most accurate with spelling and grammar; it would often miss out on commas, throw in random capital letters, and sometimes hyphenate words, and sometimes not.
  • The content generated was usually very generic and did not add stories or anecdotes specific to a company or brand. For example, when we created a blog post about Search Laboratory being awarded our B Corp accreditation, it could not pull out the specific things we had done to achieve this that were unique to us as a business. So, the copy was uninspiring and could have been written for any business that achieved this accreditation.
  • It did not stick to stricter guidelines around the tone of voice and would use wording that humans knew immediately would not be considered ‘on brand’ for our clients.
  • Like the above, ChatGPT did not understand legal issues around the wording in the copy. For example, it added guarantees to product copy that would not stack up from a legal perspective.
  • As the content temperature increased, the tool tended to add incorrect information or go off on an irrelevant tangent. For example, when we created a product page copy around a mug shaped like a horse’s head, we were given copy that went on to talk about horsehair.
  • When ChatGPT was asked to give us interesting facts and stats about UK air travel, it delivered, but when it came to finding out the source of the facts, they were nowhere to be found. It looks like the tool understands a statistic, but it hasn’t yet been linked to objective truth.

AI vs AI

Once we had all the results from our human tests, our next test was to use AI to assess if a piece of content was written by a human or by AI. We were also curious if any content would be red-flagged as plagiarized.

To run these tests, we used Originality. Originality is a leading tool for detecting plagiarism and AI-generated content, and it allowed us to test for both areas simultaneously. We tested all four examples of each type of content with Originality to understand its outcomes for human-generated and AI-generated content.

This bar chart shows the Originality AI score related to a test carried out on ChatGPT by Search Laboratory digital marketing agency.


AI vs AI: Results

All the AI-generated and human-generated content scored zero for plagiarism, meaning that none of the information was considered plagiarized.

On average, the AI content scored 91.9% for AI-generated within Originality across the three temps. The human-generated content scored an average of -91.75% as human-original, proving that Originality could quickly and accurately detect which content was AI-generated.

As the temperature increased, the likelihood that Originality could detect AI-generated content decreased slightly. Even considering that, it was still apparent to Originality that temp 100 content was AI-generated, as it scored an average of 84.9% AI-generated.


Both humans and AI could recognize when content was AI-generated. As the temp of the ChatGPT content increased, detecting that the content was AI became slightly harder for Originality, an online detection tool. However, the quality is reduced at higher temperatures, making it easier for humans to detect.

ChatGPT is suitable for research, helping with structure, and writer’s block. An improvement in the innate time efficiencies of content production.

The main flaws with ChatGPT are quality control, factual accuracy, and its lack of understanding of a company’s branding and history.

What are our recommendations?

Based on its flaws, ChatGPT will not soon replace human copywriters, and we would never recommend publishing any content from the tool without human input.

There are some content briefs that ChatGPT just would not be relevant for, such as written interviews. For everything else, ChatGPT proved helpful for research, ideation, and structure. It cut down time for things like Digital PR tips pieces. That said, research with ChatGPT must be done with a large pinch of salt, and it cannot yet be relied on for facts and statistics.

We will continue to test AI for content creation and broader use across the agency. The pace of development of this technology is rapid, so we expect improvements will come very soon.



Get insights delivered straight to your inboxSign up to our digital marketing newsletter