Do readability scores impact organic search ranking performance? [EXPERIMENT]



Content readability has recently been speculated as a ranking factor in the Google algorithm, however, with such complexity surrounding the issue, it has never been conclusive as to what impact readability can have, and the extent of this impact.

Here at Search Laboratory we love putting theories to the test - such as our nofollow links and title tag length experiments - to ensure that we are always giving the most reliable recommendations to our clients. Therefore we looked at a number of different readability metrics and decided to build a test to see if we could find any correlation between them and ranking performance.

There are a number of recognised readability scores, however, the ones we decided to measure against were:

  • Flesch-Kincaid Grade Level
  • Flesch-Kincaid Reading Ease
  • Gunning Fog Index
  • Coleman-Liau Index
  • SMOG

Methodology

We set up three brand new domains with no registration history and hosted three pieces of unique content on each domain and each piece of content competed with the other domains for a different target keyword. The keyword itself was a nonsensical made-up word that returned zero results in a Google organic search (meaning that we can rank easily and not have our results distorted by any competition).

Domain

Keyword One

Keyword Two

Keyword Three

Domain One

Content One

Content Two

Content Three

Domain Two

Content One

Content Two

Content Three

Domain Three

Content One

Content Two

Content Three

 

We invented a ‘concept’ for each of our three keywords, in an attempt to replicate a real-life scenario. The keyword itself would then be the name of that concept/entity. For Keyword One we invented a sport, Keyword Two was a rock band and Keyword Three was a cocktail bar in Leeds. This gave us something to write about and also allowed us to maintain consistency across each domain, reducing any content-topic influence on ranking performance.

The content that we produced for each keyword was similar but contained different language and structure in order to produce three notably different readability scores against which we could then measure ranking performance in an attempt to glean significant results. The readability scores of our content were as follows:

Keyword One

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Domain One

7.13

71.29

9.57

7.1

9.65

Domain Two

10.9

53.01

13.21

9.32

12.8

Domain Three

6.12

74.05

8.25

7.3

9.12

Keyword Two

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Domain One

7.73

71.5

8.65

6.96

8.77

Domain Two

9.43

59.89

10.95

9.2

11.37

Domain Three

10.81

56.75

11.83

9.82

11.94

Keyword Three

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Domain One

12.41

53.14

15.26

8.54

13.82

Domain Two

10.08

62.36

12.84

7.97

12.17

Domain Three

11.48

56.21

14.19

8.1

12.8

The way that we produced the content meant that each domain had one piece of content that had the highest readability score, one with the lowest readability score and finally one with neither the highest nor lowest across each of the three keywords. This was done with the aim of negating any positive bias that Google might have in favour of one of our domains.

After producing and publishing the content, we used the Fetch As Google tool in Google Search Console to ensure that it would be crawled and indexed.

Results

Keyword One

The rankings for this keyword were stable from the very beginning, with only one position switch since launch (26th November 2016) and the date we ended the test (11th January 2017):

Results readability test keyword 1

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Rank

Domain One

7.13

71.29

9.57

7.1

9.65

1

Domain Two

10.9

53.01

13.21

9.32

12.8

2

Domain Three

6.12

74.05

8.25

7.3

9.12

3

 

For the first two weeks of the test, it appeared that the domain with the highest grade level and lowest reading ease was going to outrank the others, but the shift in rankings on 11th December led to no obvious correlation between the readability scores and ranking performance.

The content that ranked the poorest was that with the highest reading ease score (easiest to read) however there was no correlation between this and the other two pieces of content.

Keyword Two

This keyword saw a bit of fluctuation in terms of rankings, with each domain even dropping out of the index briefly in the early stages:

Results readability test keyword 2

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Rank

Domain One

7.73

71.5

8.65

6.96

8.77

3

Domain Two

9.43

59.89

10.95

9.2

11.37

1

Domain Three

10.81

56.75

11.83

9.82

11.94

2

 

This keyword resulted in the content with the higher reading ease score (easiest to read) ranking worst in position 3, however there was no obvious correlation between a harder-to-read score and the better ranking positions. Nor was there any similarities between Keyword One and Keyword Two at this point.

Keyword Three

There was also some early fluctuation with the rankings for this keyword. Domain Three took a whole week to start ranking for the keyword:

Results readability test keyword 3

Domain

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Rank

Domain One

12.41

53.14

15.26

8.54

13.82

3

Domain Two

10.08

62.36

12.84

7.97

12.17

1

Domain Three

11.48

56.21

14.19

8.1

12.8

2

 

The best-ranking piece of content for this keyword had the highest reading ease score (easiest to read) which goes against the findings from Keyword One and Keyword Two, and that with the poorest reading ease score ranked worst in position 3.

Results Summary

The below tables group together the best- and worst-ranking pieces of content for each keyword and show any consistencies that we found across the three different data sets.

Best:

Keyword

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Keyword One

Middle

Middle

Middle

Lowest

Middle

Keyword Two

Middle

Middle

Middle

Middle

Middle

Keyword Three

Middle

Highest

Lowest

Lowest

Lowest

 

There appears to be no kind of correlation between readability scores and the content in our test that ranked the highest across our three keyword sets. The closest we can come to a trend is that in two out of three keywords, those with the lowest Coleman-Liau score ranked best. The Coleman-Liau approximates the US school grade level required to comprehend the text, and a low score (as in our case) indicates that it was the easiest-to-read content out of our three in this set.

Worst:

Keyword

Flesch-Kincaid Grade Level

Flesch-Kincaid Reading Ease

Gunning Fog Index

Coleman-Liau Index

SMOG

Keyword One

Lowest

Highest

Lowest

Middle

Lowest

Keyword Two

Lowest

Highest

Lowest

Lowest

Lowest

Keyword Three

Highest

Lowest

Highest

Highest

Highest

 

The readability scores of our poorest-performing content make for interesting reading. Two out of our three keyword sets correlate with each other, suggesting that easier-to-read content with a lower required grade level comprehension will rank poorer. A low Gunning fog index score and SMOG score also indicate a lower amount of required education. However, our third keyword set bucked the trend and suggested the complete opposite.

Conclusion

We are unable to pick out any conclusive trends from our data set as none of our results are statistically significant in our opinion. There are some common trends forming, as can be seen in the Results Summary section of this post, but because we did not see consistency across all three keywords then we cannot advocate any particular approach.

Despite our results, we still believe that readability is an important factor when writing content for webpages. It is essential that you consider the audience of your content and the kind of language that they are likely to use, read and understand. To give two polar opposite examples:

  • A scientific research paper targeting professors or PhD students is likely to use more complex language and therefore will be judged with a higher grade score and lowest reading ease score. However, this content is likely to rank better against its target keywords as a result
  • A webpage targeting young teenagers about the latest fashion trends is more likely to use slang, buzzwords and informal language, attaining a lower grade score and higher reading ease score, yet Google will consider the context and rank it accordingly.

A unique approach is required to writing these two pieces of content, and it will inevitably result in them having different readability scores, which shows that it is difficult to show whether any particular readability score will outperform another. A very complex piece of content attempting to compete in the second example niche is unlikely to perform well, similarly a very basic and informal piece of content will not rank well for complex, highly educated search queries. The approach we adopted involved us writing about concepts/entities that were ‘new’ in Google’s eyes which would hopefully level the playing field in terms of rankings.

If you have any questions about the organic performance of your website, please contact us.