Tails never fails, y'all. - Jonathan Daniel
Toss a coin 144 times and you'd expect 72 heads. If it comes up 81 times, no statistician would blink an eyelid. Why was Lovie Smith's 81-63 record any different?
One of the things that irked me about the coverage of Lovie Smith's exit from the Bears was the issue of his overall record. "Why would you fire a coach who's 18 games over .500?" people asked. He was 81-63 and won 18 games more than he lost, but that's actually nine games over .500, folks. After hearing talk on a Waddle & Silvy podcast of Tom Thibodeau being "83 games above .500" I realise it must be convention, but it's still a misleading framing issue (and one that seems to have some fans thinking coaches' records are much better than they actually are).
Having got that off my chest, the other thing that bothered me was that Lovie's win-loss record was touted without a anyone asking what should have been the key question: What was it worth? He has a winning percentage of .563 compared to what must be a baseline 50% chance of winning any game (our null hypothesis) but, in statistical terms, was that significant? For the less statistically minded among you, think of it as flips of a coin. You can flip it 10 times and get 7 heads, and flip it 1,000 times and get 700 heads; in both cases it's .700 but you'd suspect something's not right in the case of the latter. What about 70/100? Statisticians have devised formulae to test things like this, to determine if something is likely to have happened by chance.
(N.B. I should point out that coin flipping is merely used as an example of the null hypothesis and to explain the reasoning behind the calculation in non-statistical terms. No coins were flipped as part of the process. I think we can all agree that - whatever the topic - when there are two possible outcomes a split of 74-70, for example, is not much different from 72-72 and the difference between the two can come down to pure luck... but is 77-67? 79-65? How about 81-63?)
With that in mind, I ran the same statistical test as for David Taylor's results post for the WCG Pick 'Em to determine if an 81-63 record is significantly different from .500 (72-72). As we can see from the formula for Chi square, for our purposes it's nothing more than a slightly reworked ratio of a Win-Loss record weighted against the number of games coached. (Or, how does a coin-toss tally for "heads" fare when weighted against the total number of coin-tosses?)
Using the Yates correction since there’s only a single degree of freedom, and the critical Chi square value for 1 d.f. at p<0.05 = 3.841; 81-63 came to a Chi square of 2.007, well below the critical value. In other words, Lovie Smith's win-loss record in Chicago was not significantly different from .500.
What does this mean? In English, if you flip a coin 144 times, you would expect heads to come up 72 times. If it comes up heads 81 times, it's not a big enough deviation from the expected 50:50 ratio to make a statistician sit up and take notice. It's still within the parameters of chance, and there wasn't anything impacting that 50:50 ratio sufficiently to push the results beyond the realms of what we would expect from chance.
In gridiron terms Lovie Smith's 81-63 record with the Bears, although 9 games above 50:50, does NOT distinguish him from a .500 coach as the difference in their records falls entirely within the realms of chance. Or, if you will, luck.
While that sinks in, you might like to know that 700 heads out of 1,000 coin flips has a Chi square of 159.201 and thus is clearly significant. 7 out of 10 flips has a Chi square value of 0.9 and thus is not. 70/100 is also significant, at a value of 7.605.
(N.B. Once again, this does NOT mean I'm comparing football games to coin flips. If anything, if Lovie is as good of a coach as some people proclaim, he should be able to influence a game in his favour, and thus be winning significantly more than 50% of them. He evidently wasn't able to do so often enough for his W-L record to reach significance levels. That's the whole point of this analysis.)
The astute among you will notice that this must mean that a coach whose record is 63-81 is also not significantly different from a 72-72 coach. That's true, as losing those extra nine games out of 144 is entirely within the parameters of chance. He has, in essence, merely been unlucky.
I thought I would also test the regular season records of the other Bears head coaches in franchise history to see whose results were above chance levels. The results are as follows:
Table 1. Statistical significance of regular season W-L records for Bears franchise head coaches.
|Coach||# ganes||W-L record||Win %||Adj. W-L||Chi square|
|George Halas||497||318–148–31||.682||333.5-163.5||57.467 ^|
|Ralph Jones||41||24-10-7||.706||27.5-13.5||4.122 ^|
|Jim Dooley||56||20-36||.357||-4.018 ^|
|Abe Gibron||42||11-30-1||.268||11.5-30.5||-7.714 ^|
|Mike Ditka||168||106-62||.631||11.006 ^|
(Statistical notes: Winning percentage appears to disregard drawn games entirely in its calculation, but Chi square would be affected by the reduction in overall number of games, so I counted each drawn game as 0.5 in each of the wins and losses columns. The Adjusted W-L column reflects that, where such a transformation in the data had to be made.
Out of curiosity I did recalculate, this time disregarding drawn games entirely, but the results remained pretty much the same; for those interested, the Chi square values for the six coaches affected were: 60.290^, 4.971^, 3.559, 3.559, 0.696, and 7.902^; none changes in significance.)
As we can see, Jack Pardee's W-L record was as close as we have to .500 among Bears head coaches and, as such, his Chi square value is close to zero and far short of the critical value. We can say that his record clearly does not significantly differ from .500. The Chi Square values of the five coaches in franchise history whose W-L records are significant are in bold and marked with a ^. Statistically speaking, there is less than a 5% probability that the coaching records of these five coaches deviated as far as they did from 50:50 by pure chance.
Ditka may only have had the fifth highest winning percentage in franchise history but, in adjusted terms, his record was the second most impressive (second least likely to have happened by chance).
Note that Dooley and Gibron both came out as significant, yet both had losing records (even though it's mathematically inaccurate, I've made their Chi square scores negative in the table to differentiate them from their more successful brethren). Quite bad losing records. So bad, in fact, that they were significantly unlikely to have happened by chance. For whatever reason, those teams had less than a 50% chance of winning games during their tenures. (Ed can tell us more about them.)
Now, it has to be pointed out that these tests only compare W-L records and don't take into account factors such as talent on the roster, injuries, and strength of schedule; we can only use the data that we have, though with a larger sample size these tend to balance out to some extent. Dave Wischnowsky did present some information on coaches' records against winning teams (notably Lovie, Mike McCarthy, and Mike Ditka) which I thought worth looking at in the following table.
Table 2. Statistical significance of regular season W-L records against winning teams for Smith, McCarthy and Ditka.
|Coach||# games||W-L record||Win %||Chi square|
|Lovie Smith||60||20-40||.333||6.017 ^|
As we can see from these results, while all three head coaches have losing records against winning teams, McCarthy's and Ditka's win-loss ratios are close enough to .500 that they are not statistically significant and we can say that, yes, within the parameters of chance they split games evenly against winning opponents. On the other hand, there is less than a 5% probability that Lovie Smith's record against winning teams happened by chance. That points to a constant, systematic trend.
Finally, I looked at the coaching records of some of the head coaches who have won the Super Bowl over the last decade or so, the ones whose teams were in the recent Championship games, plus those of two names which were hollered early and often during
the any head coaching search.
Table 3. Statistical significance of regular season W-L records for selected head coaches.
|Coach||# games||W-L record||Win %||Chi square|
|Mike McCarthy||112||74-38||.661||10.938 ^|
|Bill Belichick||208||151-57||.726||41.582 ^|
|Tom Coughlin (NYG only)||144||83-61||.576||3.063|
|Sean Payton||96||62-34||.646||7.594 ^|
|Bill Cowher||240||149-90-1||.623||14.017 ^|
|John Harbaugh||80||54-26||.675||9.113 ^|
|Jim Harbaugh||32||24-8||.750||7.031 ^|
As you would expect, in most cases these head coaches have won a significantly higher number of games than they have lost - yes, Jim Harbaugh has a small sample size but this is accounted for in the formula. Tom Coughlin's overall regular season record is remarkably similar to Lovie Smith's, which perhaps is reflected in the "Fire Coughlin" sentiment that seems to come up every few years, but he does have two Super Bowls and three further playoff seasons to show for his nine years in charge. Jon Gruden won a Super Bowl in his first season in Tampa Bay but was clearly unable to sustain success, not winning a single playoff game in two visits to the postseason in the next six seasons. Gary Kubiak's record doesn't look impressive either but, as any statistician will tell you, you can't ignore context. After inheriting a 2-14 team that continued to struggle earlier in his tenure, Kubiak's team is trending in the right direction; for his sake, he'll hope this continues.
To conclude, then, a coach with a so-so overall record can compensate for that with Super Bowl rings, or at least show that his team is moving in the right direction and can consistently get in position to compete (i.e. make the playoffs). Given all the factors that play a part in winning games, we can't pin everything on the head coach - but it's a results-driven business and ultimately such things as coaching and scheme rest on their shoulders. If coaches' win-loss records are brought up as arguments for retaining, hiring, or firing, it's only prudent that these records should also be analysed to see what they're worth. Otherwise, we might as well flip a coin.....