Head of Creative
There has been a lot of talk about ChatGPT over the past few months, and one of the hottest topics is using ChatGPT for copywriting. So, we decided to test it out to understand its strengths and limitations and better understand if and how we could utilize ChatGPT to enhance the content production process.
Our experiments took the following steps:
The first step was to form a working group and nail down exactly what we wanted to test. Simply testing ChatGPT on all copywriting is too broad, as the briefs we work on can be quite niche. So, we collated a list of over 18 types of content that we are asked to produce; these include company news blogs, press releases for campaigns, product page copy for websites, interviews, product buying guides, and category page content. We then factored in the different nuances of writing B2B and B2C copy and ensured our list contained variants of both when relevant to test any nuances.
Once we had the list of content types, we collated the human-generated content and then built prompts to instruct ChatGPT based on the briefs we typically receive to help replicate an AI version.
Once these were ready, we ran each prompt through ChatGPT three times. Once on temp 0, once on temp 50, and finally on temp 100 to help understand the different outputs produced from the different temp settings.
This gave us four variations of the content to test: one created by humans and three created by ChatGPT at the three different temps (0, 50, and 100).
As a working group, we reviewed the initial outputs of ChatGPT and made some notes around the quality of its outputs that covered:
This gave us a starting point for whether we would be happy to deliver this content to clients and wider teams.
Because we knew which content was written by humans and which was written by AI, and because we hold a bias (we’re part of the content team, so will always advocate for human-written content), we then blind-tested the content with a wider group of people who weren’t part of the working group and who weren’t aware which content was AI and which was written by humans. These people were selected to review the content based on their knowledge of our clients and their involvement as stakeholders in our usual content sign-off process. These people would essentially be gatekeepers for getting our content live or sent to clients.
As well as assessing the quality of the content, this group also looked at whether they’d use the content they’d been sent (either by putting it live or sending it over for sign-off from a client).
Although surprised by how quickly and legibly written the AI-generated content was, it was easy for the people in the testing group to identify it as AI-generated.
What people liked about ChatGPT and its outputs was the speed at which it delivered information. This made it useful for structuring content, doing light research for ideas and tips, and adding in ideas that previously may not have been considered. For example, it suggested adding warranty information to a product page.
Despite this, the flaws with ChatGPT were obvious, and everyone in the testing group flagged the following flaws:
Once we had all the results from our human tests, tour next test was to use AI to assess if a piece of content was written by a human or by AI. We were also curious if any of the content would be red-flagged as plagiarised.
To run these tests, we used Originality. Originality is a leading tool for detecting plagiarism and AI-generated content, and it allowed us to test for both areas simultaneously. We tested all four examples of each type of content with Originality to understand its outcomes for human-generated and AI-generated content.
All the AI-generated and human-generated content scored zero for plagiarism, meaning that none of the information was considered plagiarized.
On average, the AI content scored 91.9% for AI-generated within Originality across the three temps. The human-generated content scored an average of -91.75% as human-original, proving that Originality could quickly and accurately detect which content was AI-generated.
As the temperature number increased, the likelihood that Originality could detect AI-generated content decreased very slightly. Even considering that, it was still obvious to Originality that temp 100 content was AI-generated, as it scored an average of 84.9% AI-generated.
Both humans and AI could recognize when content was AI-generated. As the temp of the ChatGPT content increased, detecting that the content was AI became slightly harder for Originality, an online detection tool. However, at higher temps the quality is reduced, making it easier for humans to detect.
ChatGPT is good for research, helping with structure, and writer’s block. An improvement in the innate time efficiencies of content production.
The main flaws with ChatGPT are quality control, factual accuracy, and its lack of understanding of a company’s branding and history.
Based on its flaws, ChatGPT is not replacing human copywriters any time soon, and we would never recommend publishing any content from the tool without human input.
There are some content briefs that ChatGPT just would not be relevant for, such as written interviews. For everything else, ChatGPT proved useful for research, ideation, and structure. It cut down time for things like PR tips pieces. That said, research with ChatGPT must be done with a large pinch of salt and it cannot yet be relied on for facts and statistics.
We will continue to test AI for use in content creation, and for wider use across the agency. The pace of development of this technology is rapid, so we expect improvements will come very soon.
Get insights delivered straight to your inboxSign up to our newsletter