The Kids Are Alright — McNabb or Kolb

The following is a guest post by @sunset_shazz.

Carson Wentz’s start to the 2017 season has garnered national plaudits for his stewardship of the Eagles’ league-leading offense. But it being 2017, there lurks a coterie of skeptics who claim his underlying ability is “horrendous” like Blake Bortles or merely pedestrian like Andy Dalton. Even more emphatically, poor Jared Goff was confidently pronounced a bust after one season.

Is it fair to judge a quarterback solely on his rookie year? What about after the first nine weeks of his second season in the league? And how might one systematically evaluate a developing quarterback, relative to historical data?

Let us consider some advanced metrics that are used to evaluate quarterbacks:

Adjusted Net Yards / Attempt (ANY/A) was developed by the great Chase Stuart, and accounts for sack yards, while providing a bonus for touchdowns and a penalty for interceptions. Both Stuart and Topher Doll have shown that ANY/A predicts wins. Danny Tuccitto has brilliantly used confirmatory factor analysis to show that ANY/A is a stable indicator of QB quality.
Defense-adjusted Value over Average (DVOA), the brainchild of Aaron Schatz at Football Outsiders, is a success-based, opponent-adjusted per-play efficiency metric intended to both correlate with non-opponent adjusted wins (descriptive) and to predict future opponent-adjusted wins.
Defense-adjusted Yards above Replacement (DYAR) uses similar success-rate inputs to DVOA, in order to compute an aggregate value for a player (combining volume and efficiency).
Total QBR is ESPN Stats & Information’s proprietary efficiency metric that combines both passing and running contributions, adjusted for game situation, with charting to assign responsibility to a quarterback’s receivers and blockers.

Through nine weeks, the 2017 sophomore class is playing at an extraordinarily high level, as measured by each of these advanced stats:

Please note that nothing herein intends to argue for any of these quarterbacks to the detriment of the others. Though the data presented above is insufficiently precise to draw ordinal rankings, it is unequivocal:

Wentz is good. Goff is good. Prescott is good. All three of these things can simultaneously be true, pace internet trolls.

Some epistemic humility is in order: the first-nine-week sample size is obviously noisy, with varying degrees of luck, opponent quality, team injuries, coaching quality and supporting casts influencing the statistical performance of each QB. Danny Tuccitto warns us that ANY/A stabilizes at 326 dropbacks, and even at that sample size, 50% of the observation represents randomness/luck. Nonetheless, the broad takeaway should be that each sophomore QB has thus far performed at a top-quartile level, judged by a variety of different metrics. Is this good? And how confident can we be that such performance will continue?

Recently, Chase Stuart noted that three sophomores from the same class have not played this well since at least the NFL-AFL merger. Though ANY/A is less context-specific than the other measures, it has the advantage of being transparent and easy to calculate, permitting historical analysis. Stuart compared the first 8 weeks of 2017 for Goff, Prescott and Wentz to full seasons of prior 2nd year QBs. Comparing partial to full seasons isn’t quite neutral, due to the disparity in number of games sampled; we should expect some mean reversion of our reference QBs as sample size increases. Using pro-football-reference’s excellent query engine, I examined the first 9 weeks for each sophomore quarterback from 1999 through 2017. Historical comparisons need to be adjusted for era, due to the enormous change in average NFL passing efficiency over time. To account for this, I divided each quarterback’s ANY/A by the league average for that year. [1]

Top ANY/A vs Average since 1999, sophomore QBs, weeks 1-9

The 76 QB sample set in this study is itself a product of survivorship bias: only those QBs who were successful enough to throw 100 passes in the first 9 weeks of their second year in the league are included. On the other side of the distribution, successful QBs who rode the pine for their first few years (like Aaron Rodgers, Tony Romo or Philip Rivers) are not in this sample. The average age of the sample is 24, similar to our reference QBs.

The three 2017 sophomores are, as Stuart observed, performing extraordinarily well relative to their peer set (all are in the top quartile of the sample). Relative to their era, they are passing with greater efficiency than Tom Brady, Drew Brees, Matt Ryan or Andrew Luck did in their second seasons.

You will also note that the top ranked sophomore QBs include many future hits (Big Ben, Kurt Warner, P. Manning) and a few notable misses (Nick Foles, Derek Anderson). The last column I included is the Career Approximate Value (CAV), which is a (very) rough method developed by Doug Drinen that puts a single number on a player’s total career, encompassing both longevity and performance.

Below, I plotted log Career Approximate Value against ANY/A relative to league average for the first 9 weeks for second year QBs from 1999-2015 (I excluded QBs from 2016-2017 because recent QBs have not yet had sufficient time to accumulate CAV points).

The positive relationship shown above indicates that the first 9 weeks of a sophomore season predicts 37% of a QB’s future CAV. Do note that the correlation is sensitive to a few outliers. The odious Ryan Leaf and Akili Smith are on the bottom left, whereas Foles and Anderson are on the bottom right. I don’t want to ascribe an illusion of precision to this rough analysis – don’t fixate on the exact R-squared number, or the model coefficients. Both sample size and the extremely imprecise nature of CAV make me hesitant to draw definitive conclusions from the data. What is interesting to me is that the same plot using a QB’s full rookie season yields an R-squared of 0.224 – in other words, the first 9 weeks of a QB’s sophomore season tells you roughly 70% more about his future career than his entire rookie season does. Extending this analysis to full seasons since 1970, the R-squared is 0.083 and 0.2348 for rookie and sophomore years, respectively (n=155 & 204). My interpretation of this data: though rookie and second year passing efficiency predict only a small fraction of a quarterback’s career value, the sophomore year deserves 2.8x as much weight as the rookie year, in terms of confidence about predictive power. Rookie performance, in particular, is extremely noisy. One would have been wise to heavily discount Troy Aikman, Donovan McNabb and Terry Bradshaw’s dreadful rookie seasons. Rams fans should take note.

Relatedly, I didn’t find any predictive power when measuring the degree of era-adjusted-ANY/A improvement from rookie to sophomore season. This echoes Vincent Verhei’s study of second year improvement using DVOA. In hypothesis testing, a negative result can be an interesting result.

Quantitative analysis is not the only tool in an NFL researcher’s kit. Film study (though not my sphere of competence) is also valuable. Though Nick Foles had a magical sophomore season, the film showed reason for concern, as my friend Derek Sarley noted. I don’t personally see similar issues with Wentz – both his pre-snap adjustments and post-snap play appear to pass the “eye test”. No, he’s not perfect. Yes, he has flaws he needs to address. But so do all second year quarterbacks.

Moreover, our penchant for treating quarterbacks as static vessels of talent/ability shortchanges the importance of coaching and development. The installation of a new coaching regime in Los Angeles appears to be an interesting natural experiment, in terms of Goff’s maturation. Similarly, we can view Ezekiel Elliott’s probable(?) suspension as an instrumental variable when evaluating Prescott.

All inductive statements are, by their very nature, revisable. We don’t know the future; we can only use informed judgment to hazard a prediction. The false-positive rate for the top 20 QBs in table 2 above is 25% by my count [2], so let’s take that as the “base rate” of failure for the 2016 Sophomore QBs. It is therefore reasonable to expect that two – perhaps all three – of the 2016 sophomores will enjoy successful careers as NFL starters.

Finally, in these impatient times, let us remind ourselves that transcendent quarterbacks do not emerge, fully formed, from the forehead of Zeus. Each of these young, relatively inexperienced quarterbacks is playing the most technically and cognitively demanding position in sports at a very high level. Adjusted for experience and era, their achievements are even more astounding. The evidence suggests that the future of quarterback play is bright. Football fans, rejoice.

Thanks to Eagles fan / Data Scientist Sean J. Taylor for his insightful discussion on methodology. Any errors are mine alone.

[1] PFR’s partial season engine shows results from 1999 onward. Full season results go back before the merger, and also generate an era-adjusted ANY/A+ which uses a “Z-score” methodology, expressed in standard deviations above or below the population mean. My method is less sophisticated, though nonetheless robust.

[2] I excluded the reference QBs, as well as Marcus Mariota.

@sunset_shazz is an Eagles fan who lives in Marin County, California. He previously wrote about 4th down decisions.