Data-driven decision making. We all aspire to it. Whether it’s in Customer Experience management, targeted marketing, deciding which candidates to hire, choosing a new car, or picking the teams in your March Madness office pool.
One of the challenges in disseminating best practices in data driven decision making is that so many decisions are made in private, or at least with private data. I can’t entertain you with the details of my clients’ data driven successes because many of the details, and especially the data, are privileged information.
But I’ve recently completed a project I can talk about. I can share the process and the data. That’s because it’s a project using publicly available data and open source software—a project to help pick optimal brackets for the NCAA Division I Men’s Basketball Tournament, also known as March Madness. Using this project as an example, I’d like to share a framework I use when approaching data driven decision-making problems.
The data driven decision-making process can be summarized by the following four stages:
Data-driven decision making. We all aspire to it. Whether it's in CX management or your March Madness office pool. [Click to Tweet!]
More often than some would like to admit, defining which problem to solve is frequently (a) critically important but (b) ignored. In the case of March Madness, what is our problem? Seems pretty simple, right? Just pick the best bracket you can. But what does “best” mean? Does it mean picking the favorites, the teams most likely to win? Or does it mean maximizing the likelihood of winning your pool?
It turns out this difference matters. While the favorite is most likely to win, we also know that the favorite is likely going to be picked by a lot of other brackets. That equates to an opportunity to take advantage of the gap between how likely the team is to win and how likely the team is to be picked. Essentially to help us win our bracket, we avoid favorites that are too popular.
So now we have our actual problem: how do we pick an optimal bracket, knowing that some teams are too popular and others are too unpopular?
Now that we have our problem defined, we have to gather data. In this case we need two sources of data: (1) win probabilities and (2) popularity probabilities. For the win probabilities, we turn to FiveThirtyEight, which publishes best-in-class probabilities of each team making it to each round in the tournament, and for the popularity probabilities we turn to ESPN which separately aggregates and publishes the percentage of submitted brackets that have picked each team to make it to each round in the tournament.
By combining these two data sources, we can find cases where likely winners are overly popular and where unlikely winners are overly unpopular. As is frequently the case, we can greatly improve our data driven decision-making by combining unrelated data sources.
With the combined win probabilities and the picked probabilities, we can identify games and teams with large gaps between which teams are likely to win, compared to which teams are likely to have been picked. For example, in the 2017 tournament, North Carolina (UNC) was the most popular team, with 15% of brackets picking them to win it all, but they were far from the most likely to win according to FiveThirtyEight, with a win probability of only 7%:
Contrast UNC with Gonzaga, which was picked as champion in just 8.5% of brackets, but which had a 14% chance to win the tournament:
Because North Carolina is so over-popular, we probably shouldn’t pick them, and because Gonzaga is so under-popular, we should heavily consider picking them.
Using this logic, and the probabilities for the entire tournament, we can simulate tournaments and pools to find the optimal tradeoff between likely winners and popularity.
When we run the probabilities through the simulator, in a pool of 50 brackets in 2017, we find that the way to maximize the probability of winning is to adjust the initial win probabilities just a little. By starting with favorites, then slightly punishing over-popular picks and slightly boosting under-popular picks, we end up increasing our probability of winning from 2% to almost 9%.
Probability of Winning Bracket Pool
With the results of the simulation in hand, we pick our bracket:
By carefully redefining our problem to match our desired outcome, combining the right data from multiple sources, applying rigorous analytic tools, and using the results to make a different decision than our competition, we can greatly improve our effectiveness. In this case we increased our probability of winning by over a factor of four, meaning that we can expect to win a bracket tournament roughly once every dozen or so years, rather than once in a lifetime.
This decision-making framework applies to all types of customer experience decisions—from identifying which customer churn should be prevented, to managing strategic investments, to maximizing customer experience gains, to determining which customer feedback to respond to—and four-fold increases are not atypical.