Content readability has recently been speculated as a ranking factor in the Google algorithm, however, with such complexity surrounding the issue, it has never been conclusive as to what impact readability can have, and the extent of this impact.
Here at Search Laboratory we love putting theories to the test – such as our nofollow links and title tag length experiments – to ensure that we are always giving the most reliable recommendations to our clients. Therefore we looked at a number of different readability metrics and decided to build a test to see if we could find any correlation between them and ranking performance.
There are a number of recognised readability scores, however, the ones we decided to measure against were:
- Flesch-Kincaid Grade Level
- Flesch-Kincaid Reading Ease
- Gunning Fog Index
- Coleman-Liau Index
- SMOG
Methodology
We set up three brand new domains with no registration history and hosted three pieces of unique content on each domain and each piece of content competed with the other domains for a different target keyword. The keyword itself was a nonsensical made-up word that returned zero results in a Google organic search (meaning that we can rank easily and not have our results distorted by any competition).
Domain | Keyword One | Keyword Two | Keyword Three |
Domain One | Content One | Content Two | Content Three |
Domain Two | Content One | Content Two | Content Three |
Domain Three | Content One | Content Two | Content Three |
We invented a ‘concept’ for each of our three keywords, in an attempt to replicate a real-life scenario. The keyword itself would then be the name of that concept/entity. For Keyword One we invented a sport, Keyword Two was a rock band and Keyword Three was a cocktail bar in Leeds. This gave us something to write about and also allowed us to maintain consistency across each domain, reducing any content-topic influence on ranking performance.
The content that we produced for each keyword was similar but contained different language and structure in order to produce three notably different readability scores against which we could then measure ranking performance in an attempt to glean significant results. The readability scores of our content were as follows:
Keyword one
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG |
Domain One | 7.13 | 71.29 | 9.57 | 7.1 | 9.65 |
Domain Two | 10.9 | 53.01 | 13.21 | 9.32 | 12.8 |
Domain Three | 6.12 | 74.05 | 8.25 | 7.3 | 9.12 |
Keyword two
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG |
Domain One | 7.73 | 71.5 | 8.65 | 6.96 | 8.77 |
Domain Two | 9.43 | 59.89 | 10.95 | 9.2 | 11.37 |
Domain Three | 10.81 | 56.75 | 11.83 | 9.82 | 11.94 |
Keyword three
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG |
Domain One | 12.41 | 53.14 | 15.26 | 8.54 | 13.82 |
Domain Two | 10.08 | 62.36 | 12.84 | 7.97 | 12.17 |
Domain Three | 11.48 | 56.21 | 14.19 | 8.1 | 12.8 |
The way that we produced the content meant that each domain had one piece of content that had the highest readability score, one with the lowest readability score and finally one with neither the highest nor lowest across each of the three keywords. This was done with the aim of negating any positive bias that Google might have in favour of one of our domains.
After producing and publishing the content, we used the Fetch As Google tool in Google Search Console to ensure that it would be crawled and indexed.
Results
Keyword one
The rankings for this keyword were stable from the very beginning, with only one position switch since launch (26th November 2016) and the date we ended the test (11th January 2017):
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease | Gunning Fog Index | Coleman-Liau Index | SMOG | Rank |
Domain One | 7.13 | 71.29 | 9.57 | 7.1 | 9.65 | 1 |
Domain Two | 10.9 | 53.01 | 13.21 | 9.32 | 12.8 | 2 |
Domain Three | 6.12 | 74.05 | 8.25 | 7.3 | 9.12 | 3 |
For the first two weeks of the test, it appeared that the domain with the highest grade level and lowest reading ease was going to outrank the others, but the shift in rankings on 11th December led to no obvious correlation between the readability scores and ranking performance.
The content that ranked the poorest was that with the highest reading ease score (easiest to read) however there was no correlation between this and the other two pieces of content.
Keyword two
This keyword saw a bit of fluctuation in terms of rankings, with each domain even dropping out of the index briefly in the early stages:
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG | Rank |
Domain One | 7.73 | 71.5 | 8.65 | 6.96 | 8.77 | 3 |
Domain Two | 9.43 | 59.89 | 10.95 | 9.2 | 11.37 | 1 |
Domain Three | 10.81 | 56.75 | 11.83 | 9.82 | 11.94 | 2 |
This keyword resulted in the content with the higher reading ease score (easiest to read) ranking worst in position 3, however there was no obvious correlation between a harder-to-read score and the better ranking positions. Nor was there any similarities between Keyword One and Keyword Two at this point.
Keyword three
There was also some early fluctuation with the rankings for this keyword. Domain Three took a whole week to start ranking for the keyword:
Domain | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG | Rank |
Domain One | 12.41 | 53.14 | 15.26 | 8.54 | 13.82 | 3 |
Domain Two | 10.08 | 62.36 | 12.84 | 7.97 | 12.17 | 1 |
Domain Three | 11.48 | 56.21 | 14.19 | 8.1 | 12.8 | 2 |
The best-ranking piece of content for this keyword had the highest reading ease score (easiest to read) which goes against the findings from Keyword One and Keyword Two, and that with the poorest reading ease score ranked worst in position 3.
Results summary
The below tables group together the best- and worst-ranking pieces of content for each keyword and show any consistencies that we found across the three different data sets.
Best:
Keyword | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG |
Keyword One | Middle | Middle | Middle | Lowest | Middle |
Keyword Two | Middle | Middle | Middle | Middle | Middle |
Keyword Three | Middle | Highest | Lowest | Lowest | Lowest |
There appears to be no kind of correlation between readability scores and the content in our test that ranked the highest across our three keyword sets. The closest we can come to a trend is that in two out of three keywords, those with the lowest Coleman-Liau score ranked best. The Coleman-Liau approximates the US school grade level required to comprehend the text, and a low score (as in our case) indicates that it was the easiest-to-read content out of our three in this set.
Worst:
Keyword | Flesch-Kincaid Grade Level | Flesch-Kincaid Reading Ease |
Gunning Fog Index |
Coleman-Liau Index | SMOG |
Keyword One | Lowest | Highest | Lowest | Middle | Lowest |
Keyword Two | Lowest | Highest | Lowest | Lowest | Lowest |
Keyword Three | Highest | Lowest | Highest | Highest | Highest |
The readability scores of our poorest-performing content make for interesting reading. Two out of our three keyword sets correlate with each other, suggesting that easier-to-read content with a lower required grade level comprehension will rank poorer. A low Gunning fog index score and SMOG score also indicate a lower amount of required education. However, our third keyword set bucked the trend and suggested the complete opposite.
Conclusion
We are unable to pick out any conclusive trends from our data set as none of our results are statistically significant in our opinion. There are some common trends forming, as can be seen in the Results Summary section of this post, but because we did not see consistency across all three keywords then we cannot advocate any particular approach.
Despite our results, we still believe that readability is an important factor when writing content for webpages. It is essential that you consider the audience of your content and the kind of language that they are likely to use, read and understand. To give two polar opposite examples:
- A scientific research paper targeting professors or PhD students is likely to use more complex language and therefore will be judged with a higher grade score and lowest reading ease score. However, this content is likely to rank better against its target keywords as a result
- A webpage targeting young teenagers about the latest fashion trends is more likely to use slang, buzzwords and informal language, attaining a lower grade score and higher reading ease score, yet Google will consider the context and rank it accordingly.
A unique approach is required to writing these two pieces of content, and it will inevitably result in them having different readability scores, which shows that it is difficult to show whether any particular readability score will outperform another. A very complex piece of content attempting to compete in the second example niche is unlikely to perform well, similarly a very basic and informal piece of content will not rank well for complex, highly educated search queries. The approach we adopted involved us writing about concepts/entities that were ‘new’ in Google’s eyes which would hopefully level the playing field in terms of rankings.
If you have any questions about the organic performance of your website, please contact us.