Draft Kings and Data Science: Double your Money, Double your Fun

If any of you have ever sat and watched a live NFL game this year, you are well aware that the new big thing is daily fantasy drafts. Who has time to sit around for a full season anymore to gloat to your friends on how little they know about pigskin? Not us, that’s who! So, I logged in to DraftKings.

Immediately, I was stunned by the numbers that appeared in the game lobby: 383k contestants were entering in daily contests. Plus, each contest had to pony up $3 per entry, so I quickly did the math to realize DraftKings is making about $200k on that size contest. Bravo, gentlemen and gentle ladies. What makes it more interesting is that they make that much regardless of entry performance. That’s the genius of running a marketplace.

Introduction

While it probably got most people excited to throw in their hat to see if they could turn $3 into $100,000, I saw an awesome opportunity to pit metal against mind. Based on the sheer volume of entries, contests, and changing player rosters, this was a prime opportunity to build an algorithm to do the picking for me.
Now, what I was aiming to do would be different from most other machine learning algorithms leveraged in fantasy sports because they focus on projections. In the projections problem, you want to predict the performance for each player as accurately as possible – because if you do – then you can put together the highest scoring roster. Several companies, individuals, and psychics have worked on this and can predict player performance with decent accuracy (most of the pack hovers around an R2 of 0.5; see 2015 Projection Analysis).
In DraftKings, a lineup is 9 players, with at least 30 potential picks for each spot (lowest is Defense and QB). Running-backs and Wide Receivers have many more options and can pick up some serious points. So, the number of potential lineups can be calculated with sampling without replacement for each of the 5 positions, which comes out to roughly 160,000,000 billion (1.6e17) combinations.

QB * DST * 2 RB * TE * 3 WR * FLEX

Choose(30,1) * Choose(30,1) * Choose(122,2)*Choose(96,1)*Choose(157,3)*Choose(400,1)

DraftKings imposes a salary cap, which reduces the number of possible combinations but increases the complexity of finding a good solution. Even if they didn’t introduce a salary cap, this would be a very large space that even 380,000 entries could not even explore a fraction of a fraction of a fraction – of a fraction – of the full space.

If you’re familiar with computational theory, this is basically a multiple 0-1 knapsack problem. In the knapsack problem, you are trying to fill a knapsack with items that have weight and value. Your goal is to maximize value while staying under a certain weight. Imagine you are in a treasure room that is starting to collapse and you need to determine which gold artifacts go in your knapsack and which ones stay out. There are several ways to solve knapsack problems, including dynamic programming and approximation algorithms. Unfortunately for us, the multiple dimensional nature of this problem makes it NP-complete.

It’s also important to point out that projected points, even the best ones, are fuzzy numbers. There’s a reason we all enjoy watching the NFL because it really is an “any given Sunday” kind of sport.

So, I rolled up my sleeves and took a crack at it.

I decided to participate in the NFL $1M Play-Action contest, since it had 25% paying spots, allowed for multiple entries, and wouldn’t break the bank at $3/entry. Plus, I thought it would pit me against a lot of good human players. I think there’s a science to picking a good contest, but that’s for a later post.

Methods

In this experiment I downloaded data from ESPN and DraftKings. I then took three approaches to building lineups: Human, Machine, and Random. Human represents a human expert, machine is a genetic algorithm and random is used as our null model. All lineups were hand-entered into the DraftKings website, but otherwise completely chosen by their respective mechanics.

Data

To participate in a DraftKings contest, you need to first select the appropriate draft group, which corresponds to a start and end time. Because different games are played on different days, you have only a certain set of players eligible for accruing fantasy points.
Luckily, DraftKings allows you to download a .csv file of all the eligible players (EXPORT TO CSV). This csv gives you the player, team, upcoming match, this season’s average fantasy points, and their salary.

1

If we take a look at the Salary, we would expect this to correlate with the expected fantasy points for that player.

A perfect correlation between Salary and FPPG (Fantasy Points Per Game) would mean that each player provides the same “bang for the buck” when it comes to purchasing lineup FPPG. For Week 3, the correlation coefficient between salary and FPPG is 0.421.

2

We can notice there’s a lot of scatter in the more interesting (higher points) part of the graph. For a first step, it would be better to use weekly projections instead of past performance. To do this, I used the R readHTMLTable functionality (see http://www.inside-r.org/packages/cran/XML/docs/readHTMLTable) to scrape the ESPN weekly projections. I chose ESPN because they provided easy weekly projection numbers and have the most error associated with their projections.

From time to time, I’m impressed what you can do with one line of code these days. Here’s a one-liner using the XML package in R that grabs all the ESPN weekly projections and puts it in a data frame:

do.call('rbind',lapply(paste0("http://games.espn.go.com/ffl/tools/projections?&startIndex=",seq(0,960,40)), function(x){ readHTMLTable(x, as.data.frame=TRUE, stringsAsFactors=FALSE)$playertable_0}))

If we look at the correlation between Salary versus ESPN projections (which have been recalculated into the DraftKings rules), there is a much stronger correlation of 0.68. In fact, it looks like they could be using some sort of logistic regression model under the hood to calculate salaries based on industry projections.

It’s important to note that these expected points correlate well to salary, but in actuality, are known to be off by about 50%. At the end of the day, these points should be interpreted as much qualitatively as quantitatively.

The other thing we get with the ESPN data is the status of the player – whether they’re out, probable, or questionable – which lets us make sure we don’t draft “Big Os”.

3

4

Human

I loaded all the data into TIBCO Spotfire to put together some solid data-driven decisions. If you’re not familiar, Spotfire is a data agnostic BI tool that supports dynamic visualization and filtering across multiple data tables. It also has a powerful embedded R engine called TERR. As probably most of you would do, I created a points/$ metric and ranked each class of player. I then went through and manually picked players that made good use of my $50,000 salary cap and were playing in contests where I thought they’d perform well.
This process took me some time and I ended up selecting 5 lineups using the method. Interestingly, this is basically the greedy approximation algorithm for the knapsack problem because I’m “greedily” packing my lineup with the best value/point players. Unfortunately this is known to give poor results in the 0-1 case.
We may assume that all other entries into the DraftKings contest were human entry, although I’m not entirely sure this is true. For this case, it was important to put in some good ol’ fashion human (expert) guesses.

Machine

In the next phase, I wanted to have an algorithm select 20 lineups that maximized points and stayed under the salary cap. I dabbled around with several approaches until I landed on using a genetic algorithm.

Initially I wanted to use a particle swarm optimization, but that performs much better in a continuous space where gradients and hill climbing make sense. Because we are dealing with human players, we can consider our solution space to be high-dimensional and binary (1 – a player is in the lineup and 0 – a player is out). This throws out most of the gradient solvers.

I could have approached the problem from an ILP perspective, which is still on my list, but I didn’t want to write all the ILP transformation code. To handle the “fuzzy” nature of the fantasy points projections, I probably would have bootstrapped the ILP with samples from a “projection distribution”. This would give us an optimal solution for one “instance” of NFL performance. We could then repeat this many times to rank lineups in terms of expected fantasy points.

I settled on using a genetic algorithm for a couple reasons: one, it is easy to implement; two, it inherently handles the vagaries expected with NFL performance; and three, is fun to say you used evolution to solve fantasy sports problems.

Ultimately, I wanted the algorithm to provide a set of “good” lineups. It’s unlikely that we actually select a winning or top 1% solution – instead we want lineups that fall mostly in the top 25% (since these are the paying spots).

So, I generated 20 lineups using the GA package for R. This was relatively straightforward to use. I adopted an 9 dimensional parameter (one for each lineup choice), that had a minimum of 0 and maximum of the number of players in that position. I enforced repeated players, salary cap busts, and minimum team requirements in the fitness function. I generated each lineup with a population size of 500 and generation count of 200. 10% of the lineups produced by the genetic algorithm weren’t viable, so I didn’t put them in. Otherwise, I used all the default parameters for the “real-valued” parameter type.

ga(type="real-valued", fitness=ffscore2, min=rep(0, 9), max=maxdim, popSize=500, maxiter=200, keepBest=TRUE, parallel=FALSE)

I tried using parallel to speed up the fitness calculations, which helped out when I experimented with larger populations. I decided on 500/200 since I didn’t see much improvement in fitness values with additional population size or iterations – it was better to restart the calculation. There are other parameters we could play with, like mutation and crossover rate, which I didn’t include in this experiment.

Random

In the case for random, I generated 20 viable lineups. These were mostly horrible and I cringed as I submitted my entries into DraftKings. Oh, in the name of science…

Results

5

 

 

6

The results were quite surprising. I watched anxiously as the genetic algorithm jumped out to an early lead and maintained it. In the end, the genetic algorithm (Machine) placed 8 lineups in the top 25% and earned $6.28/entry. Since each entry cost $3, it more than doubled what was put in! Random actually managed to land a placed entry and earned $5, which goes to show that a blind bat, or that annoying friend, can randomly choose a lineup that wins. Lastly, the Human didn’t manage to place any lineups. Since other humans actually won (and got first place) in the contest, this may be more representative of my poor performance as a knowledgeable gambler. In retrospect, I should have put more lineups in to keep the categories comparable. Unfortunately for my lineup-picking abilities I’m statistically closer to Random than Machine.

Discussion

Although I descend from a strong line of professional gamblers, I drew the evolutionary short straw on the family tree. For better or worse, I replaced this genetic deficiency with a genetic algorithm that came up big. I managed to double my money using a data science approach that optimized lineup picking, rather than projections, to create an advantage over competitors.
I was honestly surprised to see the machine learning approach outcompete a large body of decision makers that were willing to put $3 behind their bet. What’s also interesting is that the competition pool actually includes good guesses (or at least significantly better than random), so we are dealing with an intelligent population.
What set the machine learning algorithm apart, I think, was its ability to make many good unbiased lineups. As a human decision maker, I am slow, biased, and lose quality quickly. That means I can only really produce a handful of good lineups. The machine learning approach can generate thousands of lineups in a short amount of time that compete well against humans, since, after all, who really knows what’s going to happen on Sunday.

Also, this is an important lesson in volume. So long as I can put a lot of chips on the table, I can “hedge my bets” by building a diverse lineup portfolio. Because it is generating optimized lineups in a random way, the computer isn’t biased by prejudices, teams, or rivalries. It just picks based on the best data it sees.

Another interesting point is that I used the worst projections (ESPN) to make the lineups. This leads me to believe that projections are very noisy things and they really just give us a qualitative idea about how well a player is going to do.

Will this repeat itself? I’ll give you an update after Week 3 is done. In the meantime, you can get access to the Spotfire analysis that grabs the ESPN projections, merges with DraftKings and runs the genetic algorithm at Exchange.ai.

8 thoughts on “Draft Kings and Data Science: Double your Money, Double your Fun

  1. Hey Troy, great article! Some great insight here. I had one question….When I used the one liner in R using the XML package I received the following error:
    “Error in FUN(X[[i]], …) : could not find function “readHTMLTable”

    I tried looking up a solution but couldn’t find much.

    Thanks

  2. Why didn’t you enter equal numbers of entries for each type? How can you base any conclusions on 5 entries? I think you proved the machine learning is better than average human, but I think the human and random parts of the test seem kind of pointless to me..

    1. 5 entries is pretty low, I admit, but still ended up having a statistically lower mean (Student T) compared to the machine. As per the “random” – since we don’t know what the underlying “distribution” of Fantasy football scores/placements in DraftKings, we can use a random technique to build a “null hypothesis”. This gives us a distribution to test against, and make sure that our machine and humans are doing better than chance. We could use a Wilcoxon rank sums test or Student T to see if human and machine have different population means than random.

Leave a Comment

Your email address will not be published. Required fields are marked *