NFL: Predicting 2018 Win Totals with Data Science

With the Super Bowl just behind us, it’s time to predict wins for the 2018 NFL Season.  At the start of the playoffs, we looked at a model which predicted how many games NFL teams should have won in 2017 and compared our results to Football Outsider’s Pythagorean Win Expectancy.  We were able to improve on Pythagorean Win Expectancy for last year’s results, aka how many games a team should have won, but our backwards looking models were unable to beat Pythagorean Win Expectancy in predicting next year’s wins.  Today, we will build some models trying specifically to predict how many games teams will win next year.

If you simply want to know how many games your team will win in 2018, strictly for recreational purposes of course, you can skim to the end or check out our Spotfire Template.  But, for Football Outsiders fans, those interested in what makes up wins and losses, or those interested in the Data Science process, read on.

NFL Pythagorean Win Totals

Pythagorean Win Expectancy has been proven to predict wins in subsequent seasons better than win-loss totals.  For those unfamiliar with the concept of Pythagorean Win Totals, don’t get too bogged down on the Pythagorean part.  Just think of it as a formula for approximating true wins based solely on Points For and Against a team, one created for baseball by Bill James, then adapted for basketball by Daryl Morey, and finally adapted for football by Football Outsiders.

In addition to looking backwards and approximating “true wins”, the Pythagorean total does a better job than current wins and losses in predicting future wins.  A team will play more like it’s Pythagorean Win total than real life total in future games and even the next year.  An eleven Pythagorean Win team with 10 real wins is more likely to win the Super Bowl than a team with those figures reversed.

With that framework, we set out to create a better model of predicting how many games a team should have won.  While successful, the assumption that the model would also do better at predicting win percentage for the next season did not hold.  Why did it do worse looking forward than Pythagorean Win Totals?  First, we ignored issues of multicollinearity (correlation between predictors), which can lead to model instability.  More importantly, our models were built to predict this season’s wins, not next season’s.  The Pythagorean Win Total model represents a more generalized approach, and, therefore, was less prone to overfitting.

So, for this incarnation, we will address issues of multicollinearity and focus our target prediction on next year’s win percentage.

Model Building

Model 1:

Our goal for this exercise is to predict a team’s wins next year using this year’s data for our predictor variables.  So, we are using variables like total yards in 2015 to predict wins (or win %) in 2016.  We used data from 2000 to 2016, which we split into 80/20 train/test sets.

Armed with 69 variables, our first task was to cull the predictor variables.  We used backwards selection and the following plot shows the adjusted r squared for the best models at each model size:

Best Adjusted R Squared for Models of Each Variable Length 


Above, we see the best 17 variable model has the second highest adjusted r squared (which is .1532).  The 19 variable model is slightly higher but at the cost of two extra variables, so we chose the 17 variable model as a starting point.  We will call this model PredNextYrNaive–naïve because we haven’t removed multicollinearity yet.  The variables the model uses are as follows:

o.Rank + Tot.Yds + Ply + Y/P + Tot.1stD + Pass.Att + Pass.1stD + Y/A + Pen + d.Tot.Yds + d.Ply + d.Y/P + d.FL + d.Tot.1stD + d.Pass.Yds + d.Pass.1stD + d.Rush.TD + Tie

A glossary for the variables can be found at Football Reference.

Model 2:

Next, we set about to remove multicollinearity by using variable inflation factors, a statistic for how much a variable leads to multicollinearity within the predictor variables.  Factors above 5 or 10 are considered concerning.

Tot.Yds had an extraordinarily high VIF of 274.4.  Because o.Rank is determined by Total Yards, they are by definition multicollinear.  Using a QQ Plot, we can visualize that relationship:

QQ Plot for o.Rank vs Tot.Yds

When the points hug close to the QQ line, the variables are highly correlated, and here we see them fit very snugly.

We removed variables with the highest VIF values until they were all under ten.  We did this one by one because each removal of a variable changes the VIF for each variable.  Tot.Yds, d.Y/P, d.Pass.1stD,  and Pass.1stD were removed, leaving us with a 13 variable model, which we will call PredNextYrRemoveHighVIF.  So, our 13 variables are as follows:

o.Rank + Ply + `Y/P` + Tot.1stD + Pass.Att + `Y/A` + Pen + d.Tot.Yds + d.Ply + d.FL + d.Tot.1stD + d.Pass.Yds + d.Rush.TD

The adjusted r square for the new model was .1223 as opposed to .1532 for the naïve model.  So, we do lose some adjusted r squared (since higher is better) but the model should be much more stable when applied to new data.

Model 3:

Previously, we were trying to predict wins, losses, and ties in the current season (how many games a team should have won), so we couldn’t use wins, losses, and ties as predictors.  Since we now are predicting next year’s win percentage, this year’s wins, losses, and ties can be utilized.  Remember, Pythagorean Win Expectancy shows that Points For and Against are more predictive than wins and losses.  However, with other variables involved, perhaps wins, losses, and ties can have more value.  To investigate, we included Wins and Ties (but not losses since that could be gleaned from Wins and Ties), then repeated the same process of backwards selection followed by removing high VIF variables.

This process led us to a 14 variable model, which we will call PredNxtYrMod2.  The variables the model uses are as follows:

Tie + o.Rank + Ply + Y/P + Tot.1stD + Pass.Att + Y/A + Pen + d.Ply + d.Y/P + d.FL + d.Tot.1stD + d.Pass.Yds + d.Rush.TD

So, the model uses Ties and d.Y/P but not d.TotYds.  Interestingly, the model still did not use wins but did use ties; perhaps wins are already accounted for by the other variables (which are the things which make up wins).  Also interesting:  no model discussed has used Points For or Against, probably for the same reason.

Model Results

Model 2 had an adjusted r squared of .1232.  Below, we see the models and corresponding adjusted r squared:

Model Adjusted R Squared
PredNxtYrNaive .1532
PredNxtYrRemoveHighVIF .1223
PRedNxtYrMod2 .1232

The Naïve model clearly has the best Adjusted R Squared.  The other two models are very close, not terribly surprising with their similarities.

So, how do our models actually do in predicting the following year’s wins and how do they compare to Pythagorean Win Expectancy?  To evaluate, we used Root Mean Squared Error (RMSE) which measures the predicted value vs actual value.  As opposed to adjusted r squared, a lower RMSE is better and the values only hold significance in comparison to each other.  Also, we want to evaluate both in sample and out of sample (train and test set) to make sure the difference in a certain model is not too big.  A big difference between in sample and out of sample RMSE can mean the model is overfitting.

Model In Sample RMSE Out of Sample RMSE % Change
Pyth Win % 0.20923 0.20401 2.5
PredNxtYrNaive 0.17427 0.17974 -3.14
PredNxtYrRemoveHighVIF 0.17833 0.17588 1.37
PredNxtMod2 0.17801 0.17579 1.25

We see all three of our models perform significantly better in Out of Sample RMSE than Pythagorean Win %, with PredNxtMod2the best.  Yet, once again the PredNxtYrRemoveHighVIF and PredNxtMod2 result metrics are almost indistinguishable, going to the fourth decimal place.

Mod2 has an ever so slightly better Out of Sample RMSE and % change.  However, the inclusion of Ties gives me pause.  Why would a tie, a neutral winning % occurrence, be informative on the following season’s winning percentage results?  Further, the very sparse existence of ties makes me think the model could be misinterpreting and overfitting on a relatively small sample size of positive occurrences.  Therefore, since the metrics were so close, I would choose the PredNxtYrRemoveHighVIF model.

Overall, we’d have to consider our experiment a success.  All three models outperform Pythagorean Win Expectancy.  However, we have one more aspect we can evaluate.

When we started the process, the NFL season was not yet over, so we couldn’t test our model on 2017 results.  Now that the season has ended, how did our models do in actually predicting win % in 2017?


2017 RMSE

Pyth Win % 0.18273
PredNxtYrNaive 0.17347
PredNxtYrRemoveHighVIF 0.18132
PRedNxtYrMod2 0.18641

We see Pythagorean Win % rally back to the pack a bit.  And, our Naïve model actually faired the best, even though we felt it would be most unstable on new datasets.  Our chosen model PredNxtYrRemoveHighVIF was second best and none really separated themselves.  We must offer the huge caveat here that we are dealing with a small sample of 32 team seasons.  We would still feel comfortable with our chosen model.

2018 NFL Win Predictions

So, the question we’ve all been waiting for and the one many of you already skipped forward to, how many games will your NFL team win in 2018?  Since you asked, here are our chosen model’s thoughts:

2018 Predictions

Team Predicted Win % Predicted Wins
New England Patriots 0.65 10.47
Minnesota Vikings 0.62 9.96
New Orleans Saints 0.61 9.75
Baltimore Ravens 0.60 9.66
Jacksonville Jaguars 0.60 9.54
Philadelphia Eagles 0.60 9.53
Los Angeles Rams 0.59 9.43
Pittsburgh Steelers 0.58 9.34
Carolina Panthers 0.57 9.13
Atlanta Falcons 0.57 9.09
Dallas Cowboys 0.56 9.02
Tennessee Titans 0.56 8.95
Kansas City Chiefs 0.55 8.81
Detroit Lions 0.54 8.68
Cincinnati Bengals 0.54 8.59
Seattle Seahawks 0.54 8.58
Los Angeles Chargers 0.53 8.46
Washington Redskins 0.52 8.33
Green Bay Packers 0.51 8.15
Arizona Cardinals 0.49 7.92
Houston Texans 0.49 7.89
San Francisco 49ers 0.47 7.58
Buffalo Bills 0.47 7.45
Denver Broncos 0.45 7.22
Oakland Raiders 0.45 7.16
Cleveland Browns 0.44 7.05
Tampa Bay Buccaneers 0.44 6.98
New York Jets 0.44 6.96
Miami Dolphins 0.43 6.91
Chicago Bears 0.42 6.67
Indianapolis Colts 0.40 6.35
New York Giants 0.39 6.25

So, there you have it.  We shouldn’t be very surprised that our model predicts the Patriots once again to have the most wins.  Seven of the eight top predicted teams were in the final eight teams of the playoffs this year, the notable and surprising exception being the Ravens ranked fourth.

As a Houston-centric blog, we must note that the Texans predicted win total of 7.89 doesn’t take into account DeShaun Watson’s injury last season.

Thoughts?  I would love any feedback on the process, included or excluded variables, or the predictions themselves.

Check out our Spotfire Template to adjust the models or predict future seasons down the road.

4 thoughts on “NFL: Predicting 2018 Win Totals with Data Science

  1. Thanks ever so much for this James it’s great work, have you looked at Adjusted Pythag that Ben Leiblich has brought in, which eliminates garbage time from pythag?

    There is also a very high correlation stat ANY/p (adjusted net yards per pass attempt that Pro Football Reference uses) that has an R^2 of 0.789 with true points differential since 2011 that might be worth considering.

    I’ll keep close attention to the results this season!

    1. Hi Roy,

      I actually did use the Adjusted Pythagorean Win Total formula since it is the most updated formula in use, so any reference to Pythagorean Wins is referencing that. I will clear that up in future posts.

      As for as ANY, I used the team offense table from Pro Football Reference but ANY is only listed in the team passing table. I will consider it in the future.

      Thanks for reading,

  2. Maybe I missed this in a previous post, but using your chosen model, can we see results from previous seasons (maybe 3-5 seasons back)?

    It would be curious to see how well the model did in previous years.

    1. Hi Joe,

      The test and train sets were indeed taken from previous years–every year since 2000, but I didn’t list previous years results in either article. If you have Spotfire, the linked template provides the ability to run model predictions for any year.


Leave a Comment

Your email address will not be published. Required fields are marked *