The grass is not greener on Jeffrey Bowers’ side of the fence: Systematic phonics belongs in evidence-based reading programs

Share this post

Note: A revised version of this post has been published in The Educational and Developmental Psychologist


A rejoinder to Bowers, J. S. (2020). Reconsidering the evidence that systematic phonics is more effective than alternative methods of reading instruction. Educational Psychology Review, Online first.

There is strong agreement among reading scientists that learning the phonological connections between speech and print is an essential element of early reading acquisition. Meta-analyses of reading research have consistently found that methods of reading instruction that include systematic phonics instruction are more effective than methods that do not. This article critiques a recent paper by Jeffrey S. Bowers that attempts to challenge the robustness of the research on systematic phonics instruction. On this basis, Bowers proposes that teachers and researchers consider using alternative methods. This article finds that even with a revisionist and conservative analysis of the research literature, the strongest available evidence shows systematic phonics instruction to be more effective than any existing alternative. While it is fair to argue that researchers should investigate new practices, it is irresponsible to suggest that classroom teachers use anything other than methods based on the best evidence to date, and that evidence favours systematic phonics.

Dr Jennifer Buckingham


Jeffrey Bowers has been arguing for some time that systematic phonics instruction does not have a strong evidence base in its favour, via seminar and conference presentations and social media. Bowers believes there is a strong consensus that systematic phonics instruction is effective in early reading instruction and that this consensus is unwarranted.

Bowers’ review of research on systematic phonics was recently published in a journal article (Bowers 2020). The format and formal referencing in a journal article allows a robust critique of his conclusions.

The objective of Bowers’ review paper is this:

“… I will show that there is little or no evidence that systematic phonics is better than the main alternative methodsused in schools, including whole language and balanced literacy. This should not be taken as an argument insupport of these alternative methods, but rather, it should be taken as evidence that the current methods used in schools are far from idea (sic). Once this is understood, my hope is that researchers and politicians will be more motivated to consider alternative methods.” (p. 1)

Let’s assume that the word ‘idea’ in this quote is a typo and it should be ‘ideal’. The notion of an ‘ideal’ method of instruction is probably the key concept that needs to be addressed in Bowers’ paper. He sets out to make the case that the evidence base for the effectiveness of systematic phonics instruction is weaker than is often portrayed and uses this to support his argument for the use of “alternative methods”.

The main alternative methods to systematic phonics used in schools are whole language and balanced literacy approaches. As can be seen in the above quote, Bowers writes that he does not support these methods either, presumably because they also do not have strong evidence in their favour.

I agree with Bowers that researchers should never consider that they have found the ultimate solution and stop looking for better ones. Researchers should be open to the possibility that there is more to know and continue to investigate new hypotheses and develop new theories, using rigorous scientific research principles. However, it is altogether different to propose that teachers and politicians should consider using unproven ‘alternative’ methods. Teaching practice and education policy should be based on the best available evidence unless and until it is superseded by new information and new evidence.

Often the best available evidence is imperfect. Research in human sciences rarely progresses in a perfect, linear way. There will be unresolved questions and unexplained variance. Educational research is not always conducted in laboratories under pure experimental conditions. Much of it takes place in schools, where there are innumerable uncontrolled and uncontrollable factors. This is how it should be – clinical experiments provide valuable information about how the brain learns to read, but until a way is found to translate this information into real world classroom practice it is of little practical value for teachers and students. The question therefore is what method(s) have the greatest weight of evidence in their favour.

In his paper, Bowers reviews major meta-analyses of studies that have looked at the effect of systematic phonics instruction on various reading outcomes, with Bowers placing a particular emphasis on the effects on reading comprehension. His key criticisms are that these studies do not directly compare systematic phonics with what he calls ‘unsystematic phonics’ and therefore do not prove the hypothesis that systematic phonics is better, and that the strength of the measured effects of systematic phonics is overstated.

I will argue below that his interpretations of the findings of these meta-analyses are skewed and inaccurate, and his criticisms of the conclusions of the studies’ authors are unwarranted. However, going to this level of detail in my response is arguably unnecessary since even Bowers’ selective, post hoc analysis of the findings leads to the conclusion that there is stronger evidence in favour of using systematic phonics in reading instruction than not using it.


What is systematic phonics?

It is very important to state here that the broad term ‘systematic phonics’ describes practices for the teaching of decoding and word reading. It does not describe nor is it intended to be a teacher’s entire approach to reading instruction or literacy more generally. Evidence-based understandings of systematic phonics place it within a comprehensive program of instruction that includes four additional essential elements – phonemic awareness, fluency, vocabulary and comprehension.

Within the Simple View of Reading, systematic phonics is part of the ‘word identification’ factor. It does not supplant or contradict the need for instruction that develops language comprehension.

This is important to state, as it informs the interpretation of the evidence for systematic phonics instruction. Alone, systematic phonics is not a fool proof guarantee of reading success; its effectiveness is mediated by the quality of the rest of the literacy program. (It is not unusual to see systematic phonics programs dropped into a literacy program that works against its effectiveness, for example schools that teach a synthetic phonics scope and sequence but then encourage children to use multi-cuing strategies for guided reading with predictable texts that are not aligned with the teaching sequence.)

Therefore, comparing the effects of systematic phonics instruction with comprehension-based programs is a false comparison. Both phonics and comprehension instruction are necessary; a finding of a positive effect of one on reading outcomes does not prove that the other is unnecessary.

A serious fault line in Bowers’ paper is his definition of systematic phonics. When an argument is based on a flawed premise it is tempting to dismiss it out of hand.

According to Bowers,

“…systematic phonics explicitly teaches children grapheme-phoneme correspondences prior to emphasising the meanings of written words in text (as in whole language or balanced literacy instruction) or the meaning of written words in isolation (as in morphological instruction).” (p. 3)

This is incorrect. It implies that systematic phonics instruction mandates teaching the entire grapho-phonemic code before looking at the meaning of words and morphology, which is not true. Systematic phonics instruction takes place alongside meaning-based instruction, including vocabulary and comprehension. Morphology is often introduced part way through the phonics sequence. Bowers’ mischaracterisation of systematic phonics permeates his paper and perhaps explains why Bowers seems to believe that the evidence for the positive effect of non-phonics instruction presents a challenge to the conclusion that systematic phonics is effective. It is a matter of and, not or.

Bowers quotes from Castles, Rastle and Nation (2018) to bolster his description of systematic phonics as preceding any teaching on the meaning of words, including morphology, but he misrepresents their position. Systematic phonics does not preclude a focus on the meaning of words. There is no directive that learning grapheme-phoneme correspondences (GPCs) must precede all other elements of reading instruction. The criteria for systematic phonics only apply to the aspect of instruction that focuses on teaching decoding. The expectation is that children will have concurrent instruction in all of the ‘Big 5’ – phonemic awareness, phonics, fluency, vocabulary and comprehension – and that the phonics component will be systematic and explicit.

The common recommendation that morphology instruction is introduced after a short period of systematic phonics instruction in which the phonological relationship between letters and speech sounds has been established (the optimal time for this is yet to be determined experimentally) is based on the well-established understanding that GPCs are the basic common sub-units of both words and morphemes. A child cannot read morphemes without being able to read graphemes, unless one is willing to make the case for teaching morphemes as logographic units, which does not fit with the evidence for phonemic awareness and orthographic mapping.

Bowers includes two types of instruction within his category of systematic phonics – synthetic and analytic. Synthetic and analytic approaches are both based on the rationale that writing emerged first as a code for speech, with spelling and writing becoming more complex over time to preserve meaning in the orthographic system.

To grasp the concept of reading, children first need to understand that writing represents speech in a systematic way. Synthetic and analytic phonics are also based on research showing that for beginning readers, meaning is activated via a phonological pathway in the brain. Being able to accurately say/hear the word they are reading, either aloud or mentally, retrieves its meaning. Over time, repeated exposure to words and the retention of specific orthographic representations in memory leads to ‘sight reading’ – the ability to read text instantaneously without the need for phonological cues, except when encountering a novel word.

Synthetic and analytic phonics instructional approaches differ in the unit of sub-word analysis. Synthetic phonics begins with phonemes – the smallest sub-word level. Children learn the associations between speech sounds (phonemes) and the letters or letter clusters that represent them in writing (graphemes), and that this is a reversible process. They learn to synthesise the phonemes and graphemes to read and spell words. Systematic synthetic phonics instruction has a defined sequence for teaching grapheme-phoneme correspondences.

The unit of word analysis for analytic phonics is larger sub-word units such as onset-rime. For example, rather than learning to read the word rat as a composition of three letters and sounds, r-a-t, children would learn that the word rat is in a ‘word family’ with the rime -at, such as r-at, s-at, c-at, and so on.

While both synthetic and analytic phonics can be considered systematic to some extent, synthetic phonics is the most systematic approach. It has the benefit of allowing the phonetic code to be covered in a shorter period of time. There are far fewer GPCs than there are ‘word families’. There is also research showing that knowledge of phonemes is a stronger predictor of early reading acquisition than knowledge of rimes. For these and other reasons I will expand on later, I think that the strength of evidence strongly favours synthetic over analytic approaches. This should be kept in mind as I examine the research on systematic phonics that fails to distinguish between the two.

There is further blurring of boundaries in the studies of systematic phonics by collating whole class initial instruction with interventions, and not having clear criteria for what represents high quality, evidence-informed instruction.


Bowers’ re-interpretation of the meta-analyses is neither fair nor accurate

In his review of evidence on systematic phonics, Bowers looks in detail at meta-analyses conducted by the National Reading Panel (2000) later published as Ehri (2001), as well as Torgerson, Brooks and Hall (2006), McArthur et al. (2012), Galuschka, Ise, Krick and Schulte-Korne (2014), and Suggate (2010 and 2016).

In his summary of the National Reading Panel (NRP) analysis, Bowers argues that the effect sizes are not large and do not justify the NRP’s conclusions that systematic phonics should be taught in schools. However, the effect sizes quoted by Bowers are moderate to high, especially for synthetic phonics in particular, and are certainly stronger than the evidence found for any other method, including whole language.

There is little practical use pointing out that systematic phonics does not get the highest imaginable effect sizes when there is nothing better to replace it with. The facts are that 1) we don’t know how well the phonics programs in the NRP were taught or how rigorous they were and 2) the studies in the meta-analysis are 20+ years old. And yet the results are still stronger than anything else to date. Subsequent analyses have added to the evidence in favour of including systematic phonics in reading instruction rather than contradicted them.

Bowers presents the findings of two re-analyses of the studies included in the NRP by Camilli, Vargas and Yurecko (2003) and Camilli, Wolfe and Smith (2006) that are alleged to dispute the NRP’s conclusions. Yet after some substantial re-engineering of the data, Camilli, Vargas and Yurecko (2003) still found that the effect of systematic over non-systematic phonics instruction was significant.

As reported by Camilli, Wolfe and Smith (2006),

“Using regression analysis to estimate simultaneous effects on reading, Camilli et al. (2003) found that programs using systematic phonics instruction outperformed programs using less systematic phonics (d = .24), and, though this effect was statistically significant, it was substantially smaller than the estimate of the NRP (d = .41).” (p. 29).

Camilli, Wolfe and Smith (2006) manoeuvered the data even more, creating a multi-level model that included language-based activities as a moderating variable. It reinforced the finding that systematic phonics was superior to no phonics but reduced the simple effect of systematic phonics over non-systematic phonics, which Bowers incorrectly interprets to mean that “Camilli et al (2006) failed to show an advantage of systematic over unsystematic phonics” (p. 9).

This re-analysis actually showed that systematic phonics is more effective when taught along with high quality language activities – exactly what would be predicted from the Simple View of Reading – and confirming the additive effect found in the earlier study.

“Camilli et al. (2003) found that the benefits of systematic phonics, language activities, and tutoring may be additive: their confluence may triple the effect of phonics alone.”

Overall, far from presenting a challenge to systematic phonics, the findings of Camilli et al. can be described as supporting the conclusion that some phonics instruction is better than no phonics instruction, and the more systematic the phonics instruction is, the better. The best case scenario is systematic phonics instruction paired with high quality language activities.

Bowers next looks at a meta-analysis by Torgerson, Brooks & Hall (2006) that was limited to randomised controlled trials (RCTs). After limiting the included studies to RCTs, Torgerson, Brooks and Hall (2006) found moderate to high effect sizes for systematic phonics on word reading (0.27 to 0.38) and comprehension (0.24 to 0.35), depending on whether fixed or random effects models were used. The word reading effect was statistically significant. After removing one study with a particularly high effect size, the overall result was reduced for word reading accuracy but still of moderate size and still significant.

The authors were apparently concerned that potential publication bias (the tendency for journals to be more likely to publish studies that find significant results) may have inflated the effect size estimates. Bowers explains in detail how Torgerson, Brooks and Hall (2006) came up with prima facie evidence of publication bias. However, when they went looking for unpublished studies they found only one, the inclusion of which would have made little difference to the effect size calculated in the meta-analysis.

“Despite searching exhaustively in the grey literature databases only one unpublished RCT (Skailand, 1971) was found” (Torgerson, Brooks & Hall, 2006, p. 27).

Nevertheless, Bowers seems unwilling to let the prospect of publication bias go, suggesting that “these findings likely overestimate the efficacy of systematic phonics given the evidence that bias may have inflated the estimate of effect sizes in this study”. This is purely speculation, and if Bowers is going to be a stickler for strong standards of evidence then that high standard should apply here too.

Torgerson, Brooks and Hall (2006) concluded that:

“Systematic phonics instruction within a broad literacy curriculum appears to have a greater effect on children’s progress in reading than whole language or whole word approaches. The effect size is moderate but still important.” (p. 10)

And also said:

“Since there is evidence that systematic phonics teaching benefits children’s reading accuracy, it should be part of every literacy teacher’s repertoire and a routine part of literacy teaching, in a judicious balance with other elements.” (p. 49)

Bowers takes issue with this conclusion, saying it “greatly exaggerates” the findings, and points out that the comparison is between systematic phonics and a combined category of unsystematic phonics and no phonics. His argument is that this does not show that systematic phonics had better outcomes than unsystematic phonics. However, the latter category is so nebulous that it is difficult to see how unsystematic phonics and no phonics could be easily distinguished. And it remains the case that Torgeson, Brooks and Hall’s (2006) findings support the conclusions of the NRP (2000) and Camilli et al. (2003; 2006) that including systematic phonics instruction is better than not including it.

At this point, Bowers summarises the four meta-analyses in a way that grossly understates the findings, claiming that the results found for systematic phonics instruction are weak or non-existent, despite having just presented results showing the opposite. To describe the Torgerson, Brooks and Hall (2006) study as showing “null effects” is clearly inaccurate, and to posit that there was “evidence for publication bias” despite an exhaustive search turning up only one unpublished study, appears to be a deliberate misrepresentation of the findings.

A meta-analysis of phonics interventions for children with reading difficulties by McArthur et al. (2012) was updated in 2018 but Bowers reviews only the 2012 study. He reports the moderate to high effect sizes found in the study, not all of which had sufficient numbers to be statistically significant.

McArthur et al. (2012) describe their results like this:

– a moderate effect of phonics training on word reading accuracy in poor word readers.
– a large effect of phonics training on nonword reading accuracy in poor readers.
– a moderate effect of phonics on word reading fluency in poor readers.
– a small-to-moderate negative effect of phonics on nonword reading fluency in poor readers.
– a small effect of phonics on reading comprehension in poor readers.
– a small-to-moderate effect of phonics on spelling words in poor readers.
– a small- to-moderate effect of phonics on letter-sound knowledge in poor readers.
– a small-to-moderate effect of phonics training on phonological output in poor readers.

Bowers argues that the high results for word reading accuracy were due to two studies — Levy and Lysynchuck (1997) and Levy, Bourassa and Horn (1999) — and that these studies should be excluded because they were one-to-one interventions. However, all of the studies in this meta-analysis were small group or one-to-one interventions, so there is no good reason to exclude these two particular studies just because the interventions were found to be particularly effective. Nonetheless, in Bowers’ summary he unilaterally decides to remove the Levy et al. studies and comes to the spurious and erroneous conclusion that the McArthur et al. (2012) meta-analysis found “no evidence” that systematic phonics instruction was effective.

McArthur et al. (2012) are also criticised by Bowers for not comparing systematic with unsystematic phonics instruction. Bowers writes “this analysis should not be used to make any claims that systematic phonics is better than standard alternative methods, such as whole language that do include unsystematic phonics”. Here we start to see Bowers distinction between unsystematic phonics and whole language unravel.

McArthur et al. 2018 (not reviewed by Bowers) included several more studies in their updated meta-analysis. They report Standard Mean Differences instead of effect sizes and found that, for poor readers,

“Phonics training appears to be effective for improving literacy-related skills, particularly reading fluency of words and non-words, and accuracy of reading irregular words.” (p. 2)

It is worth noting the range of interventions included in the McArthur et al. meta-analyses, even though Bowers does not. The analyses combined ‘phonics only ‘interventions with ‘phonics plus phonological awareness’, and ‘phonics plus sight words’. The description of the ‘phonics only’ interventions indicates that they were very limited in duration and/or scope. For example, Barker (1995) used an intervention based around the teaching of short vowel sounds, and Levy and Lysynchuck (1997) and Levy, Bourassa and Horn (1999) used an intervention that taught phonemic segmentation of randomly selected single syllable words. McArthur et al (2015a and 2015b) were eight week programs. Many of the ‘phonics only’ interventions are not what would now generally be considered to be good examples of evidence-based phonics interventions. The ‘phonics plus phonological awareness’ interventions appear to be more comprehensive but McArthur et al. (2018) did not analyse the effects of the different types of intervention separately. Given the shortcomings of some of the interventions themselves, it is unsurprising that the effect sizes are not routinely large, especially for comprehension, but the effects on other reading outcomes are significant and important.

Galuschka, Ise, Krick and Schulte-Korne’s (2014) meta-analysis also focussed on the effect of a wide range of interventions for children with reading difficulties, including systematic phonics interventions of various kinds. Bowers reports the authors’ finding that only the phonics interventions produced a significant result, along with their conclusion that:

“This finding is consistent with those reported in previous meta-analyses… At the current state of knowledge, it is adequate to conclude that the systematic instruction of letter- sound correspondences and decoding strategies, and the application of these skills in reading and writing activities, is the most effective method for improving literacy skills of children and adolescents with reading disabilities.”

Bowers disputes this conclusion, once again raising the spectre of publication bias with little real reason to do so. The speculation about publication bias inflating the results is very unpersuasive. There are many people who would be delighted to publish studies showing a null result of phonics instruction so the idea that there are a lot of undiscovered, unpublished studies out there showing null results is difficult to believe.

Bowers also attempts to dismiss Galuschka, Ise, Krick and Schulte-Korne’s (2014) conclusions by listing the comparable effect sizes of the other interventions and stating that the only reason phonics was statistically significant is because of the larger number of studies and therefore more participants than the other interventions. His implication is that, since the effect sizes for other interventions are of similar magnitude, this challenges the conclusion that phonics interventions are particularly effective.

Bowers’ reasoning here is flawed for a number of reasons. First, Galuschka, Ise, Krick and Schulte-Korne (2014) have not overstated the case for systematic phonics interventions. Based on statistical significance, they explicitly say that “At the current state of knowledge”, their conclusion about the relative effectiveness of systematic phonics instruction is sound. This simply means that, at this point in time, we can have more confidence in this finding than in the effect sizes found for the other treatment conditions.

Second, it is inconsistent with his criticisms of Torgerson, Brooks and Hall (2006) and Suggate (2010), where he argues that effects found were weak, based on their lack of statistical significance. One can argue that statistical significance is important or not, but not change position within one paper based on a preferred interpretation of the results.

A meta-analysis published in Suggate (2010) looked at the effects of a number of interventions on the reading outcomes of children ‘at risk of reading difficulties’. Suggate (2010) calculated effect sizes for phonological awareness, phonics, comprehension-based and ‘mixed’ interventions. Similar effects sizes were found for all of them, but with a significant interaction between age and intervention type: phonics interventions were stronger for younger children and comprehension or mixed interventions were stronger for older children.

Bowers describes comprehension and mixed interventions as “alternative” interventions, as though they are an alternative to, or in conflict with, phonics interventions. As I explained earlier, instruction in phonics and comprehension are not in competition with each other. Presumably, but this detail is not included in the meta-analysis, children were offered interventions based on the area of reading with which they were having difficulty. Children with decoding or word reading difficulties would have phonics interventions and children with poor reading comprehension but with adequate word reading would benefit most from comprehension interventions. Children with an undefined or general reading difficulty benefit from both. This is consistent with the age interaction.

The finding that code-based interventions including phonics had a stronger impact in the early years and comprehension interventions had greater effects in the upper years of primary is consistent with the well-validated model of reading that, once decoding is established, the largest variance in reading comprehension is accounted for by language comprehension. There is no challenge to the importance of phonics, or the impact of systematic phonics instruction, in these findings.

Likewise, the later Suggate (2016) meta-analysis does not contradict their 2010 study, as Bowers claims. Suggate (2016) looked at the long term effects of phonemic awareness, phonics, fluency, and comprehension interventions and, as would be expected, found them all to be effective in the short term and less so in the longer term. These are four of the five essential elements of reading instruction (along with vocabulary) that have been identified through extensive research. The study found phonics interventions to have relatively small long term effects. The lower long-term effects of phonics interventions can be explained by the constrained nature of phonics. Once children have mastered decoding, other aspects of reading instruction become stronger variables in their reading ability. Phonological awareness (PA) interventions had a long lasting effect. The majority of PA instruction studies were with children considered at-risk, low readers or with reading difficulties, and therefore PA instruction would be highly important for these children.

A number of other studies get a mention in Bowers’ paper (Adesope, Lavin, Tompson, & Ungerleider , 2011; Hammill & Swanson, 2006; Han, 2010; Sherman, 2007; Torgerson, Brooks, Gascoine, & Higgins, 2018). In each case, positive findings for systematic phonics are downplayed, irrespective of the actual findings and the conclusions of the authors. Bowers’ key criticism, beyond the relative effect sizes, is what he regards to be weak evidence directly comparing systematic phonics with unsystematic phonics.

Part of the problem with this premise is that unsystematic phonics is nebulous and undefined. It involves matters of degree – an absolute example of no phonics would be difficult to find. According to Bowers himself, whole language methods can contain unsystematic phonics. This is arguably more accurately described as balanced literacy; the boundaries are blurry. Given this difficulty of defining what is unsystematic phonics and what is whole language (with or without systematic phonics) and what is balanced literacy, it seems reasonable and practical to do what almost all studies and meta-analyses have done – compare systematic phonics instruction with the absence of systematic phonics instruction.

The available evidence from multiple studies shows that reading instruction that includes systematic phonics is more effective than instruction that does not. There are many more studies showing superior outcomes when children are systematically and explicitly taught letter-sound correspondences and how to blend and segment them to read and spell words, than there are studies showing a negligible or negative effect. The range of effect sizes is due to numerous factors, including the duration, level of systematicity, intensity, age of students, beginning level of students, group size, instructional fidelity, and the quality of classroom instruction. Nevertheless, the overall effect size is invariably and significantly positive.

At some point evidence may accumulate for other approaches that rivals systematic phonics. However, right now it does not.


What are the potential “alternatives” to systematic phonics instruction?

On a purely abstract level, it is all well and good to look at the research on reading instruction as an academic exercise and conclude that since no instructional method meets the highest possible standard of evidence, then they are all equally (in)effective and none should be endorsed over any other. But teaching reading is not just an abstract, academic exercise. All children need to be taught to read, and teachers need to make choices about how they are going to do it. If systematic phonics is removed from the recommended list, it leaves a vacuum that will be filled by “alternative methods”.

What are these potentially effective alternative methods, and what is the likelihood they will be more effective than systematic phonics? Whole language and balanced literacy – the boundaries between which are blurry – are not recommended by Bowers. Which leaves a mythical, undefined and therefore unproven alternative.

Into this vacuum, Bowers casually places the idea that instruction “should focus more on the role that meaning plays in organizing spellings (via morphology) and that English spelling system (sic) makes sense once the interrelation between phonology, morphology, and etymology are considered.” (p. 23)

Jeffrey Bowers’ brother Peter Bowers has developed just such a program, called Structured Word Inquiry (Bowers & Bowers, 2008). Jeffrey Bowers has co-authored papers with Peter Bowers on the rationale for the program (Bowers & Bowers, 2017) as well as participated in evaluations of the program (Colenbrander et al., 2019). This direct connection to a specific program is not mentioned in Bowers (2020).

There is no issue with academics developing reading programs. I work for MultiLit, which produces reading programs for schools. It’s not uncommon for academics who know reading research well to get impatient with the lag between evidence and teaching practice, to become frustrated by avoidably high rates of struggling readers, and to come up with ways to accelerate the quality of reading instruction in schools more directly. It’s natural that these reading programs would be informed by the developers’ understanding of the best available evidence. It is also reasonable to place more confidence in programs that have been subjected to multiple high quality evaluations and can show evidence of effectiveness.

The problem with seeing Structured Word Inquiry (SWI) as a superior alternative to systematic phonics is firstly that there is insufficient information to assess whether it is in fact based on the best available evidence. Bowers’ dismissal of what the vast majority of reading scientists accept about the essential role of learning GPCs in early reading acquisition suggests that it is not.

According Stanovich (2000),

“That direct instruction in alphabetic coding facilitates early reading acquisition is one of the most well established conclusions in all of behavioural science.” (p. 415)

And, more recently, Seidenberg (2017),

“For reading scientists the evidence that the phonological pathway is used in reading and especially important in beginning reading is about as close to conclusive as research on complex human behavior can get.” (p. 124)

Based on the SWI program descriptions publicly available, it is not clear whether it includes a systematic phonics component. If they do teach GPCs it would appear to be in a way that is more closely analogous to analytic than synthetic phonics, but in which the sub-word level analysis is based on morphemic units rather than sound units.

Furthermore, and more importantly, there are no studies showing that SWI is highly effective for reading, either with or without the sort of comparison group that Bowers (2020) says is necessary to truly prove efficacy. His own evaluations of SWI do not compare it with systematic phonics. A study conducted in 2015 to 2018 by a research group including Jeffrey Bowers at Bristol University compared SWI with a program called Motivated Reading (Nuffield Foundation, n.d). Children in the study were in Years 3 and 5, and had poor reading and spelling skills. A slide presentation of the results on the study, which have not yet been published in detail, shows that it found:

“No evidence that Structured Word Inquiry is more effective than Motivated Reading for improving reading, spelling, vocabulary or reading comprehension”


“Motivated Reading instruction led to greater reading gains than Structured Word Inquiry for the weakest readers (also true for Year 5 spelling)” (Colenbrander et al. (2018), Slide 42).

An earlier study of SWI with Year 4 and 5 students found that it improved learning of vocabulary for new words using taught morphemes but did not transfer to words from other morphological families (Bowers & Kirby, 2010). Neither this study nor the Nuffield study looked at the effectiveness of SWI for initial reading instruction with beginning readers. This is a critical point – the studies and examples of SWI involve children who have already had a year or more of reading instruction, usually including some phonics.

To be quite clear, it is easy to see on a theoretical basis why the sort of word analysis prescribed in SWI would be helpful for reading and spelling. However, while there is some slim evidence of benefits for older children (but certainly not to the high standards Bowers holds other research to), there is no research showing it is beneficial for beginning readers.

Given Bowers’ strong stance on the importance of comparing systematic phonics with alternative methods, it is notable that when the opportunity arose to evaluate SWI, his research group not to compare it to a program that included systematic phonics (Ashman, 2020). Conducting the study with beginning readers would have made the comparison even more valuable, but this may not have been possible because synthetic phonics is mandatory in the early years, and/or ethics committees would not allow children to be denied systematic instruction in phonics in the early years.

Irrespective of the theoretical merits or otherwise of SWI, it is illogical to argue against the use of systematic phonics on the basis that the evidence supporting it does not quite meet extraordinarily high standards, and then suggest using an alternative teaching method that has little evidence of effectiveness at all.


The effect of  synthetic phonics and the Phonics Screening Check in England

Bowers (2020) posits that policy changes on early literacy instruction in England since 2007 provide a natural experiment in the effectiveness of systematic phonics. This is true, but we must remember that natural experiments are not pure experiments. The implementation of a policy will always have more diffuse effects than a controlled experiment. Just because a government decrees that schools must teach synthetic phonics does not guarantee that they will do it willingly and well. Policy changes also take place in a context in which multiple other changes are occurring, and isolating the effect of one policy is rarely exact. However, against those caveats, if the policy change does not move the targeted educational outcome in the intended direction within a reasonable period of time, then the policy needs to be reviewed and, on the basis of thorough analysis, either strengthened or replaced.

Bowers (2020) argues that the implementation of mandated synthetic phonics (a highly systematic form of phonics instruction) and the Year 1 Phonics Screening Check have not led to improvements in reading outcomes. He points to trends in scores on the English national tests, and two international assessments – the Program for International Student Assessment (PISA) and the Progress on Reading Literacy Study (PIRLS).

Machin, McNally and Viarengo (2018) also thought that the phased introduction of systematic synthetic phonics instruction was a good opportunity to look at the impact of the policy. They compared Year 2 (age 7) and Year 6 (age 11) reading outcomes of students in three successive cohorts – a pilot (ERDp) group of schools who were the first to have teachers trained in synthetic phonics, a Phase 1 (CLLD) group of schools who were the second to have teachers trained, and the national rollout to all schools.

Machin, McNally and Viarengo (2018) wrote,

“Our empirical analysis shows that intensive training in the use of a “new pedagogy” produced strong effects for early literacy acquisition amongst young school children… The most interesting finding here is that there are long-term effects at age 11 for those with a high probability of starting their school education as struggling readers. Specifically, the results suggest that there is a persistent effect for those classified as non-native English speakers and economically disadvantaged (as measured by free school meal status).” (p. 239).

Bowers (2020) is strongly critical of this conclusion. He writes:

“For the ERDp [pilot] sample, the authors reported highly significant effect of systematic phonics on the foundation stage assessment immediately after the intervention (0.298), but the effect dissipated on key stage 1 tests (0.075), and was eliminated on the key stage 2 tests (− 0.018).” (Bowers, 2020, p. 19)

This inaccurate interpretation of the data suggests that the between-group comparisons were the same throughout the study, which is not the case. These effect sizes are for comparisons between the pilot group and the control group, who initially had not had the synthetic phonics training at the immediate post-test stage. My understanding is that by the time Key Stage 2 (KS2) data were collected, there was no longer a ‘control group’ per se. The national roll out had commenced and all schools had had synthetic phonics training, including the original control group. The KS2 (age 11) results compare schools that have all been part of the national phonics program, but for differing amounts of time. It looks like many of the children who had not been part of the pilot or first phase – and who had lower scores than their pilot school peers at age 7 (notably the more socioeconomically and educationally advantaged native English speaking, non-Free School Meal [FSM] students) – were able to close the gap by age 11 when their schools joined the program. However, for some groups of at-risk children, especially those from non-English speaking backgrounds and low income (Free School Meals), the effect of being in the pilot or first phase, that is having longer exposure to phonics instruction, was persistent to age 11.

“Indeed, for the ERDp [pilot] sample, there was a tendency for more economically advantaged native English children (not in receipt of free school meals) to read more poorly in the phonics condition in the key stage 2 test (− 0.061), p < 0.1. As the authors [Machin, McNally & Viarengo 2018] write: “It is difficult to know what to make of this estimate”.”. (Bowers, 2020, p. 18)

The quote used by Bowers appears on p. 233 of Machin, McNally and Viarengo (2018). The context of their quote is that this negative finding for native English-speaking, non-FSM students at age 11 was found only for the pilot group, that is, not for the other cohorts. The rest of the quote in Machin, McNally and Viarengo (2018) is “It is difficult to know what to make of this estimate, though we do not find it when we consider effects for the next cohort.” (my emphasis). Bowers’ selective reporting here is deceptive.

“Note, the long-term negative outcome [for] economically advantaged native English children in the ERDp sample was of a similar magnitude to the long-term benefits enjoyed [by] non-native [English] speakers (.068) and economically disadvantaged children (.062) in the CLLD treatment condition, and accordingly, [it] is difficult to brush this finding aside.” (Bowers 2020, p. 18)

I have added the words in square brackets to this quote so that it makes grammatical sense. Bowers’ insists that the finding for this one subgroup in the pilot cohort should not be “brushed aside”. However, nor should the subsequent finding for the Phase 1 group, which did not show this negative effect for the same sub-group, be “brushed aside” in the way Bowers has done by not acknowledging it.

“More importantly, this study did not include the appropriate control condition.” (Bowers, 2020, p. 18)

Here, Bowers returns to his refrain about the instructional comparisons being made. The control condition in the Machin, McNally and Viarengo (2018) study was ‘business as usual’. This is inherent to natural experiments. It would have been useful if the study had described typical classroom practice in the control schools but it is nevertheless the case that students whose teachers had training in synthetic phonics had better outcomes than students whose teachers did not.

“As was the case with most of the above meta-analyses, the conclusion the authors made was not even tested.” (Bowers, 2020, p. 18)

This is not a fair criticism. Machin, McNally and Viarengo (2018) conclude that the phonics training and coaching provided to schools in successive waves over a number of years led to improved reading scores in subsequent years. They make no assumptions about the characteristic of instruction in comparison schools. As each group of schools was provided with synthetic phonics training the differences between them dissipated, with the exception of the most disadvantaged students, for whom a positive impact of earlier participation in the phonics program remained.

As Machin, McNally and Viarengo (2018) put it:

“We are able to provide convincing evidence of causal effects from the introduction of synthetic phonics in English primary schools because of the way in which training was staggered across different local authorities (and hence different schools). Indeed, we show similar effects from a pilot and the first phase of the national rollout which followed. Moreover, effects of the interventions become much smaller or cease completely in subsequent waves of the national rollout, suggesting that the targeted and large-scale rollout had beneficial effects on the literacy of primary age school children.” (p. 239)

Bowers (2020) next turns his attention to England’s performance in PIRLS and PISA. There is little point discussing the PISA results in a great deal of detail. The cohort of 15 year old English students who participated in the latest PISA tests in 2018 were in Year 1 during the phased implementation of synthetic phonics policies a decade ago. They may or may not have had teachers who were part of the Phase 1 training group. At that time, the quality of phonics instruction was variable and this remained the case for several years: the pilot of the PSC in 2011 found that one in three students achieved the expected standard of phonic decoding, indicating that phonics instruction was not strong in all schools even at that stage (Department for Education, 2011). It was not until 2014 that PSC scores reached an average of around 75% students achieving the expected standard. The three year PISA cycles mean that the first cohort of Year 1 students to have achieved a relatively good standard in the PSC will do the PISA tests in 2024.

The Year 4 PIRLS results might be seen as more relevant. The students who participated in PIRLS in 2016 were in Year 1 in 2013, although at that time the PSC results still indicated that the quality of phonics instruction was not universally high. The PIRLS scores for England in 2016 increased steadily over the cycles from 2006 (mean score 539), to 2011 (mean score 552), and 2016 (mean score 559). England has also climbed in the PIRLS country rankings, but because the number of participating countries changes from cycle to cycle, rankings are not very informative. For this reason, Bowers’ citing of England’s country rank in 2001 (third of 35 countries) against its country rank in 2016 (eighth of 50 countries) tells us little of value. However, the mean score in 2001 was also relatively high at 553 – virtually the same as in 2011. Bowers asks how this can be explained.

One answer might be differences in the sampling of students to participate in the PIRLS tests. Questions were raised about the sample of students in the English 2001 PIRLS assessment in an article by Hilton (2006), in which she argues “the sampling and the test itself to have been advantageously organised” (p. 817). A large number of selected schools declined to participate, and the rate of exclusions and withdrawals among subgroups of children likely to be low performers was relatively high compared to other countries. Hilton conjectures that this reduced the proportion of children who would normally be in the ‘long tail of underachievement’ that has been typical in England and would have led to a lower mean score. A report by McGrane et al. (2017) for Oxford University Centre for Educational Assessment says that there was a “relatively large error for the average score in 2001” (p. 33) so comparisons of 2001 with 2016 should be made cautiously.

McGrane et al. (2017) offer this explanation of the high mean score in PIRLS 2016:

“The percentage of England’s pupils meeting the Intermediate and Low Benchmarks is greater than all previous PIRLS cycles. This improvement at the two lower benchmarks is largely responsible for England’s overall significant improvement in PIRLS 2016.” (p. 31)

While Bowers (2020) mentions a theory that there may have been a sampling issue with the 2016 test, the reference he provides is not to a published source. I have not found any published studies with similar analyses or concerns about PIRLS samples in any year other than 2001.

For some reason, Bowers also makes an entirely unfounded statement about phonics instruction being “less engaging”, which yet again is inconsistent with his insistence on high standards of evidence elsewhere.

Bowers points to the relatively strong performance of Northern Ireland in PIRLS assessments, claiming that the Northern Ireland Education and Library Board’s reading guidance for Key Stage 1 (KS1) does not mention systematic phonics. As no specific reference is provided for such a document, this cannot be verified. However, systematic phonics is mentioned in the current literacy strategy published by the Northern Ireland Department for Education (2011).

The Northern Ireland literacy strategy document Count, Read, Succeed states that,

“To support pupils’ development of literacy and numeracy skills the principal, in particular, must ensure that: … (j) in primary schools, there is a systematic programme of high-quality phonics.” (p. 25)


“When choosing an approach to the teaching of phonics schools should ensure that:
• the approach is consistent with the principles of the revised curriculum;
• the approach is explicit and structured;
• the approach reflects and is informed by the levels that pupils are expected to achieve by the end of each Key Stage and by the need to ensure progression;
• the phonic knowledge and understanding are applied in meaningful contexts;
• the learning is well paced, interactive and engaging for pupils;
• the approach is suitably differentiated to meet the needs of all pupils; and
• the programme is systematic and developmental in nature.” (p.27)

The PIRLS report on Northern Ireland refers to this strategy. It is hard to know for sure why their PISA score was comparatively high but there is no evidence that Northern Ireland provides a counterfactual to the systematic phonics policy in England.

As mentioned above, the Year 1 Phonics Screening Check (PSC) does not escape Bowers’ attention. The PSC was introduced to monitor whether students had an adequate level of decoding ability at the end of Year 1 – their second year of school. Low average PSC scores would indicate that phonics teaching was not as effective as it should be. The aim of improving students’ decoding skills is to help them to attain accurate and fluent word reading in the early years of school, which would ideally be embedded in a rich language and literacy program to help develop language and reading comprehension, as per the Simple View of Reading and the strong recommendation of the Rose report (Rose, 2006).

After the PSC had been implemented nationally for three years, a team of researchers at the National Foundation for Educational Research (NFER) was commissioned to undertake a review of the PSC and evaluate its impacts. The review by Walker et al. (2015) found that “the national results show an improvement in performance in phonics, as measured by the Check, which would be consistent with adjustments to teaching methods reported” (p. 11).

In a report I wrote in 2016 (Buckingham, 2016), I provided graphs showing that trends in the KS1 tests show an increase in Year 2 reading and writing scores over the period following the introduction of the PSC, up until the KS1 tests changed in 2016, breaking the trend line and making further trend analysis difficult, if not impossible.

Bowers claims that the Walker et al. (2015) evaluation report contradicts this finding, but this is not the case. First, Walker et al.’s evaluation finished in 2014. The addition of 2015 data in my report makes the trend clearer. Second, Walker et al. (2015) conducted a different analysis of the data. They calculated a value-add measure from EYSFP (Early Years Foundation Stage Profile) scores at the end of Reception and KS1 reading scores (Year 2). They compared the value add from 2011 and 2012 (the year of the pilot PSC and the national PSC implementation) with the value add in 2013 and 2014 and found no difference in growth between the two measures. There are some clear methodological issues and ambiguity in this method of assessing the impact of the PSC on KS1 scores that Bowers seems willing to ignore, including those acknowledged by Walker et al. (2015), who say,

“The EYFSP points represent children’s attainment at the end of the Reception year of school. During this time they are very likely to have made a start in learning phonics; and thus it cannot be regarded as a true baseline measure in determining the subsequent impact that the PSC makes in improving children’s literacy skills.” (p. 26)

This means that the Walker et al. (2015) analysis is not comparable to my description of the KS1 scores and therefore they cannot be regarded as “inconsistent”.

The graphs of KS1 and KS2 scores from 2006 to 2018 in Bowers (2020) clearly show an upward trend in reading and writing from 2011 to 2015 that is greater than the upward trend for math and science. It is not surprising that maths and science would also improve slightly as reading improves because maths and science tests require children to be able to read the questions proficiently. There have also been significant curriculum and teaching reforms in maths and science in period since the PSC was introduced. Neither the graphs themselves nor the descriptions of the graphs in Bowers (2020) make it clear that the drop in KS1 and KS2 results in 2016 are due to a dramatic change in the tests themselves that year. There is no mention of this in the text either – just a footnote to one of the graphs which would give any reader not already in possession of that fact the impression that there was a real drop in scores. This significantly affects the validity of any test score analysis post-2018, which is not acknowledged by Bowers. Even so, the graphs show that scores for reading and writing began to increase after the pilot PSC (from 2011 to 2015). How much of this is directly attributable to the PSC is unknown but there is certainly no evidence of “a stark disconnects (sic) between PSC and SAT scores” when the relevant periods are considered.


Is Bowers right about a lack of good evidence for systematic phonics instruction?

In a word, no.

This is how Bowers views the research on systematic phonics instruction, based on his revisionist and selective evaluation of the research literature.

“Despite the widespread support for systematic phonics within the research literature, there is little or no evidence that this approach is more effective than many of the most common alternative methods used in school, including whole language. This does not mean that learning grapheme-phoneme correspondences is unimportant, but it does mean that there is little or no empirical evidence that systematic phonics leads to better reading outcomes. The “reading wars” that pitted systematic phonics against whole language is best characterized as a draw.” (p. 23)

Even Bowers’ skeptical re-interpretation of the evidence base does not support his conclusions, as I have set out above. Systematic phonics has one of the largest and most consistent evidence bases in education. Synthetic phonics, which is the most systematic form of phonics instruction, has been specifically investigated in a number of randomized control trials (for example, Christensen & Bowey, 2005; Hatcher, Hulme, & Snowling, 2004; Johnson, McGeown & Watson, 2011) and has been found to be a common factor in high performing schools (Joseph, 2019; Louden, 2015; OFSTD, 2010). Synthetic phonics is strongly aligned with cognitive scientific research and models of reading that have been found to be highly predictive – the Dual Route Cascading Model (of word reading) and the Simple View of Reading (for reading comprehension) in particular (Castles, Rastle & Nation, 2018). The same cannot be said for whole language, balanced literacy, or analytic phonics.

And while there is some validity to the argument that meta-analyses provide a more accurate estimate of the true effect of an intervention, there is also good argument to be made for giving strong consideration to the findings of individual studies with rigorous methodologies that investigate a higher quality version of the intervention of interest. The meta-analyses include interventions that are short in duration, with small numbers, and restricted instructional scope and depth. Emphasis should also be given to the findings of larger studies with interventions that more strongly resemble what would generally be considered ideal classroom practice, such as the Clackmannanshire study (Johnston, McGeown & Watson, 2011).

Bowers goes on to say,

“The conclusion should not be that we should be satisfied with either systematic phonics or whole language, but rather teachers and researchers should consider alternative methods of reading instruction. For example, one possibility is that reading instruction in English should focus more on the role that meaning plays in organizing spellings (via morphology) and that English spelling system makes sense once the interrelation between phonology, morphology, and etymology are considered (Bowers & Bowers 2017, 2018c). Of course, other possibilities need to be considered as well, but the first step in motivating more research into alternative forms of instruction is to realize that there is a problem with the current approach.” (p. 23)

Bowers has constructed a lengthy journal article aimed at challenging the use of systematic phonics by attempting to undermine its evidence base. His entire thesis rests on the flawed argument that when held up to the highest possible standards of evidence, systematic phonics falls short. It is therefore completely illogical to then suggest using “alternative teaching methods” that have either much weaker evidence or no evidence base whatsoever.

It is one thing to say that researchers should consider investigating unproven alternative methods, but it is irresponsible to make the same recommendation for teachers. Classroom practice should use the methods with the strongest evidence base available, and at the moment that is undeniably systematic synthetic phonics.

More Posts

Hi there!

Want to drop us a line?  You can get in touch by filling out the form below and we’ll get back to you as soon as possible!