Mean Confidence Intervals vs Median Confidence Intervals
Welcome! Fantasy Football is coming and it’s time to prepare for our draft. As you may know, Average Draft Position(ADP) is an important “statistical number” that help users for their draft.
What is Fantasy Football?
Fantasy Football is a game where participants manage a virtual team of NFL players. The points accrued by each NFL player is dependent on how they perform in their respective games. The Fantasy Football (virtual) “managers” will need to fill in their roster with players they believe will perform the best. The total points for each team will face off another virtual team to see who wins for that week.
1. Intro to ADP
2. Intervals with Mean
3. Intervals with Median
Intro to ADP
The NFL players are initially acquired by the Fantasy Football participants through a draft before the season starts. The most common type of draft, snake draft, is where each participant waits for their turn to “draft” a NFL player. For example, if I was the 4th pick on a 10-man snake-draft; I would pick on the 4th, 17th, 24th, 37th, and on.
Continuing on from the previous example, if I am the 4th pick on a 10-man snake draft, it would be helpful to know the players that may be available for all rounds(pick#4, #17, #24, #37 and on). Many fantasy football participants, including myself, use ADP to give a general idea of when certain NFL players will get drafted. http://www.fantasypros.com is a common site where people can look up such ADP numbers.
Furthermore, the website also includes a list of a expert rankings per player. The screenshot to the left are the expert rankings for ‘Christian McCaffrey’. The number of rows can range from 90 to 110 depending on the player.
EUREKA! For each NFL player, I have “population” mean (ADP) and a sample. I can find some confidence intervals and do some one sample t-tests! And so… I did.
Confidence Intervals with Means
For Christian McCaffrey, we are 95% confident the population mean will be between -0.43608~2.91314. Great!… But…
This information is not only incorrect(because I assumed population distribution to be normal) but moreover, it’s kind of useless to the average fantasy football player. There is no pick number that goes down to the decimal point(or negative). I realized that I made the mistake of assuming the ranks to be a continuous variable. Despite the data points being integers, these variables are categorical.
Confidence Intervals with Medians
To deal with discrete variables that are ranked, I can find the confidence interval of the population median. Calculating median intervals does not have to pass any of the parametric assumptions like when calculating mean.
The sample median is calculated by first sorting the array in ascending order and retrieve the middle value. In order to calculate the confidence interval for a median, I need to revise the mechanics of a binomial distributions.
I’ll use the distribution of a fair coin (p= 0.5) to continue the explanation of median confidence of intervals. On a fair coin flip, I have a 50% chance of the coin landing heads and 50% chance of landing tails. It is possible, although unlikely, that I will land heads 10 times in a row. In order to calculate this, we can use scipy stats package to calculate the pmf:
Given a fair coin, we have a ~0.1% chance to get heads 10 times in a row. Symmetrically, I have a ~0.1% chance to get tails 10 times in a row. We can also use the cdf function to find the total probability of flipping heads less than or equal to 9 times as well as less than or equal to 2 times.
If we subtract the cdf(k=2…) from cdf(k=9…), we get the total of 94.44%. Which states that we are 94.44% confident, that given a fair coin, we will flip heads between 2–9 times out of 10 flips. This calculation works very similarly when finding the confidence interval for medians of population. For example, if our sample size was 10 and with a sorted list of numbers, we can say we are 94.44% confident that the population median will be between the 2nd and 9th value of the sample.
When sample size is large, for example greater than 80, we can use the formula written below.
The answers from formula is a little different than from scipy stats but I’m okay with it. If anyone can let me know why it calculates differently, please let me know!
Continuing on the sample ranks for Christian McCaffrey, we can be 95% confident the population median will be between cmc[44–1] and cmc[65–1].
Calculating mean intervals can be tricky but calculating the median can cut through all the preliminary complexity. Use the median intervals.