The following is a guest post by @sunset_shazz.
Should the Carolina Panthers have fired Head Coach Ron Rivera or traded QB Cam Newton the day after they lost the Super Bowl? Scott Kacsmar at FiveThirtyEight argues they should have done either or both. Do read the whole piece; the argument is presented as follows:
- In NFL history, only 4 coaches have won their first Super Bowls after 5 seasons on the job with the same team;
- No team has ever started the same quarterback under the same head coach for more than 5 years and seen that duo win its first championship.
Having examined the history of prior first time Super Bowl winners, FiveThirtyEight infers that these characteristics are conducive to winning championships. The study’s conclusion: “If championship success doesn’t come within five years, things tend to get stale, and someone eventually has to move on from their position of power.”
Can you spot the flaw in this reasoning?
How about if I used the same exact logic, using a more emotionally salient characteristic:
- In NFL history, only 4 minority head coaches have won Super Bowls. Therefore you shouldn’t hire minority head coaches. 
Does that framing device make the flaw in reasoning clearer?
FiveThirtyEight’s study suffers from the confusion of the inverse, a statistical fallacy that undergraduates are commonly taught to avoid. One of the best recent treatments of this problem was a brilliant piece by Katherine Hobson on the lab-testing startup Theranos (also, funnily enough, at FiveThirtyEight). Chapter 8 of Nate Silver’s excellent The Signal and the Noise provides a lucid discussion on this topic, in the context of Bayes’s theorem.
Here is the issue: the fraction of Super Bowl winners that possess a certain characteristic, by itself, tells you nothing about the probability that those who possess that characteristic will win a Super Bowl. A better way to estimate the latter would be to go back and examine the historical success rate of coaches who possess the characteristic you’d like to study.
I compiled every season coached since the 1970 merger, then excluded the seasons after a coach has won his first Super Bowl. Coaches who were tenured 5 years or fewer with their teams won 24 first Super Bowls in 1009 opportunities, for a success rate of 2.38%. Coaches tenured 6 or more years won 4 first Super Bowls in 176 opportunities, a 2.27% success rate. Using a technique previously used in the Duck Bias study, I applied the cumulative distribution function of the binomial distribution to test whether the success rates were different, to a statistically significant degree. The P value of 0.592 indicates no statistically significant difference.
However, Super Bowl success is a noisy, sparse data set, due to the very small sample size. An alternative measure of coaching success which enjoys the advantage of more data is the frequency with which a coach makes the playoffs. I compiled the playoff rate for every coach in the dataset, and compared this with the base rate of success for that year. The data shows that coaches with longer tenure are actually more likely (47.7%) to make the playoffs than shorter tenured coaches (31.1%) and the base rate (38.0%); both of these differences are statistically significant.
Obviously, this data doesn’t tell you anything about causation. There is likely a survivorship bias / selection effect: those coaches who are kept by their team after 5 years without a championship are likely of higher quality than average, which is probably why their subsequent success rate is higher.
For Coach-QB pairings, 28 Super Bowls were won in 1137 opportunities for the short tenured pairs, a 2.46% success rate. There were only 48 seasons where a Coach-QB pairing lasted more than 5 years without having won a Super Bowl. The zero success rate is, statistically speaking, the effect of randomness, rather than a measured effect. In terms of playoff success rate, once again the longer tenured coaches had a higher success rate, though this effect was not found to be statistically significant.
The data is pretty clear: you shouldn’t fire your coach, or your QB, just because he has not won a Super Bowl after an arbitrary number of years. The only reason short tenured coaches seem to have been historically more successful is they vastly outnumber the long tenured ones. FiveThirtyEight’s model was fooled by the fallacy of the inverse.
But given that we’re in the midst of a coaching carousel accompanied by a Rooney Rule kerfuffle: what about the reductio ad absurdum argument I cheekily proffered above? What does the data say about minority head coaches?
I used Wikipedia’s Rooney Rule page to code every minority head coach in the dataset, presented below. This data comprises every coaching season between 1970 and 2016, including all seasons for coaches who won multiple Super Bowls.
Minority coaches won Super Bowls in 3.31% of their opportunities, which is statistically indistinguishable from the base rate of success of 3.22% (note that minorities have disproportionately coached in more recent years, after the league has expanded, which lowers the base rate of championship success). Interestingly, minority coaches made the playoffs 58 times in 121 opportunities (47.9%) which is 11.6 more times than one would expect given the base rate, a difference that is statistically significant (p=0.02). This is a noteworthy result: historically, the presence of a minority head coach is associated with a 25% greater rate of making the playoffs.
Again, one shouldn’t make causal inference claims from historic data. I’m not arguing that minorities are inherently better coaches. In this situation, there is no survivorship bias. Might there be a selection effect? The late Nobel Laureate Gary Becker argued in 1957 that employment discrimination (racial or otherwise) is inefficient. Not only does the victim of discrimination bear a cost, but so does the discriminating employer (through lower productivity per unit labor cost). Axiomatically, to the extent that some employers exhibit an unfounded bias, an employer who doesn’t discriminate can capture a portion (but not all) of the foregone surplus. Moreover, selecting from a pool of employee candidates who are the victims of racial discrimination will yield supernormal productivity. Becker’s theory of discrimination is one plausible explanation for the effect shown by the data.
What does this mean for NFL teams today? The data is unequivocal that Ron Rivera shouldn’t be fired solely because he hasn’t yet won a Super Bowl. The data also shows that Al Davis, who got many things wrong, got a few things very right. An NFL team should examine the pool of minority head coach candidates very carefully, and should strongly consider hiring from this pool.
Not merely because it’s the right thing to do, but because the data suggests it helps you Just Win, Baby.
 To be clear, this is not argued by FiveThirtyEight. I employed this reductio ad absurdum to permit the reader to more easily intuit the confusion of the inverse.
 The P-value is the probability that, conditional on the null hypothesis being correct (i.e. no effect), one would observe the data in question by chance. Though subject to recent debate, the conventional standard for social science is to reject P-values greater than 0.05.
 The base rate of success is # of playoff teams / # of total teams in the league, both of which have changed over time. I accounted for the league’s expansion of teams in 1976, 1995, 1999 and 2002, as well as the evolution of the playoff format from 8 to 12 teams in 1978 and 1990, and the 16 team playoff that occurred during the strike-shortened 1982 season.