How To Use Machine Learning to Beat Your Friends in an NFL Confidence Pool

A CUNY School of Professional Studies adjunct professor shows how to increase your odds of winning

10 December 2015

It’s game day and the stakes are high. You bet on your favorite football team to win by 3 points, and this time you’ve risked more than usual. Perhaps you went with gut instinct, or relied on what the experts said. It is possible that you might not have weighed all the factors that can help determine which team will win. That’s where machine learning can come into play.  

Machine learning is the construction of algorithms that can learn from and make predictions on data. Amit Bhattacharyya, an adjunct professor in the data analytics department at the CUNY School of Professional Studies, in New York City, has tested machine learning methods for making picks in a U.S. National Football League confidence pool.

In a confidence pool, participants pick outright winners of each game that week and assign a confidence to each pick (16 highest, 1 lowest). If the pick is correct, you get that number of points for that game that week, otherwise you get zero. Participants can rank the teams and winners in any order they choose. At the end of the season the person with the most accumulated points wins the league.

Enter machine learning. Bhattacharyya believes machine-learning algorithms to be a clever way to automate the picking process. “It takes the emotion out of the equation,” he says. He estimates that if you make picks using the spread method, you will win about 50 percent of the time. However, machine learning can help to increase those odds. He presented his findings on 21 October in a webinar hosted by the CUNY School of Professional Studies.

OUTSMARTING THE SPREAD

In order to do better than the point spread method, participants need to incorporate variables that the spread method potentially does not take into consideration or does not give enough weight to, Bhattacharyya says. For example, personal experience led him to incorporate an additional variable to account for division games played at home. In the NFL, teams are grouped into divisions of four teams and each team plays division opponents twice a year (home and away). He has observed that there is a bigger advantage for the home team in such matchups. However, it is left to his algorithm to decide how much weight to put on this variable.

Other relevant factors he includes are overall win-loss records, the previous season’s win-loss record, weekly point spread, and the specific week of the football season.

Bhattacharyya uses logistic regression, which classifies a binary outcome (0 and 1), as one method for making predictions because it easily allows participants to select one of two outcomes: wins or losses. Moreover, since the model returns a number between 0 and 1, it can be interpreted to be the probability of how likely that team is to win. These probabilities can be used to rank the picks from highest to lowest. This simple form of logistic regression allows for a number of inputs called features (which are the factors listed) and just a single output for the binary classifier, which is whether the team will win.

While it may seem that machine learning is only for the highly technical types, he says there are a variety of tools and platforms available to help beginners or experienced programmers analyze this data. The sample code below shows that it is relatively straightforward to construct and fit a model once the data has been properly formatted, where X is for features and the Y is the classifier.

The key to any type of analysis is to carefully identify the problem that you are solving and make sure that your data is right for that problem, Bhattacharyya says. His algorithm is designed to quantify the probability that the favored team wins.

Platforms to use include:

  • BigML: An easy-to-use machine learning tool that allows users to input their data and displays the results visually.
  • Scikit-Learn: An open-source machine learning tool designed for the more experienced programmer who is familiar with Python.

And then there is Bing Predictions, which uses its own algorithm to predict probabilities of a team winning.

Once the data is processed, the weekly picks are generated simply by updating the current week’s spreads for each game. When a favored team, according to the spread, has less than a 50 percent chance of winning, the underdog is selected to win that week. Bhattacharyya has been testing this method for several years now and says it has provided him with consistent outcomes. So far in this season, the logistic regression algorithm has a slight lead on the spread method.

To learn more, watch the full presentation below.

Learn More