Abstract
Using a dataset that compares offense with opponent defense and special teams with opponent special teams, adjusts for opponents, and translates normal yards-per-game statistics into percentages over average, statistical learning methods such as support vector machines and neural networks can accurately predict over 65% of games. The Las Vegas spread, however, remains the best predictor in terms of correlation with actual scoring margin. In addition, the dataset used in these experiments contains variables with many high-order relationships that are difficult to capture. All methods of predicting the final scoring margin of football games, including the spread, have problems getting within 10 points of the correct margin on average. They are also susceptible to natural variations in the proportion of games won by the “better” team, which was aberrantly high in 2005 and aberrantly low in 2006.
Problem Statement
Can the outcomes of NFL games be predicted accurately using statistics adjusted for opponent and direct comparison of units on each team that actually face each other (e.g. Home team's run defense vs. Away team's run offense)?
Background and Motivation
In the 2006 season, ESPN’s 8 football experts, all of whom have played in the National Football League and some of whom have been on championship teams, were able to predict only 56-63% of the outcomes of NFL games.1 Half of them were unable to crack the 60% barrier. In that year, 53.125% of the games were won by the home team. 2 The knowledge that took each expert years to accumulate was worth at most 2 games per week more than a very rudimentary predictor. Why are NFL games so difficult to predict?
One possibility is that the statistics by which people judge teams are faulty. In football, offenses and defenses are usually ranked by the average amount of yards they gain and allow respectively per game. These statistics are fine for seeing which are the very bad and the very good teams. In 2006, the widespread consensus was that the Colts’ rushing defense was historically terrible, as they allowed 173 rushing yards per game, almost 30 yards more than the next worst rushing defense.3 San Diego, the #7 rushing defense in terms of yards per game, however, ranked 21st in yards per attempt. So how good was San Diego’s rushing defense in reality? The line between good and mediocre and bad can become blurred with these basic statistics. According to Football Outsiders judging pass defenses on total yardage is “phenomenally stupid.”4 San Diego’s offense was very good in 2006. In many games, they were likely able to get the lead early enough that most of their opponents were forced to abandon the time-consuming running game. Similarly, if a team gives up few passing yards per game but many rushing yards, it might indicate that the team is losing games so early that the opponents spend most of the games literally running out the clock. The yards-per-game statistics, then, may not reflect only the quality of a team’s offense or defense but also the impact of the other unit and the playcalling of opponents’ coaches. Statistical learning methods could possibly capture these complex relationships for more accurate predictions.
Another possibility is that people do not trust the statistics enough. Experts and fans alike often cite intangibles such as “momentum,” “clutchness,” “mental toughness” and “team chemistry” as reasons for picking one team over another. Since we don’t really know what is going on in the minds of players, these observations are often conjecture. Peyton Manning was considered to be a “choker” in playoff games, despite the fact that he had won several playoff games. Then he made a historic comeback against his nemesis, the Patriots, and won the Super Bowl in the 2006 season. John Elway, despite leading comebacks in two of the greatest games of all time, was not considered to be “clutch” because of his poor performances in the Super Bowl until he ended his career with two Super Bowl rings. Statistics reflect what has occurred on the field, so if a player is performs poorly against good opponents, whether because of a mental deficiency or a gap in physical ability, then his statistics are likely inflated by playing poor opponents and should be adjusted thusly. Finally, some games just come down to luck. In football, unusual plays happen all of the time. In the 2006 playoffs alone, a botched field goal attempt that should have been an easy win turned into a loss for the Cowboys. An interception that should have sealed the game was fumbled by the Chargers, who subsequently lost. A wide receiver dropped a surefire touchdown pass while not being within 10 yards of any defender to cost the Patriots another shot at the Super Bowl. Ninety-nine times out of one-hundred these events would not have occurred, and the games could have easily gone the other way. Sometimes teams simply don’t perform at their normal level. These things simply aren’t predictable.Hypothesis
The dataset described here is designed to capture many complex relationships. The rushing and passing yards-per-game statistics are the result of offense and defense quality to a certain extent. The punt return and coverage statistics are used to try to encapsulate the relationship between field position and the likelihood of scoring given the team’s offense and the opponent’s defense. The home field variables by themselves are relatively meaningless, but together, they can be very useful towards predicting games. Simple statistical learning methods such as linear regression will not be able to capture these complex relationships, but a mutli-layered, back-propagation neural network or a support vector machine with a non-linear kernel should be able to. The more complex statistical learning methods should be able to improve on the accuracy of the Las Vegas favorite baseline predictor (65%) and possibly achieve the 75% accuracy of the Kahn system over multiple seasons rather than 2 weeks.
Sources