clock menu more-arrow no yes

Filed under:

How to Talk About Analytics: Intro to Measurement and Sample Size

New, comments

The beginning of the season always starts my counter for iterations on the following statements: “two games of the season is a small sample size, but [PLAYER] looks vastly [IMPROVED/WORSE] compared to last season” and ”[PLAYER] is averaging 0.5 more [STAT] and 0.75 more [STAT] per game - it’s obvious that cutting [FOOD] out of their diet made a huge difference”. It is one of my least favorite tropes, because in most (but not all) use cases, the person making the statement is using “sample size” in an entirely colloquial and arbitrary manner or vastly underrating their ability to actually discern difference. My goal with this piece is to explain a dry and sometimes confusing (although important) statistical concept using data we all understand and know.

Just as an FYI, for the rest of this article, I am m going to be presenting arguments I do not actually believe, but are simply done for statistical comparison purposes. Again, this is not advocating for (or against) any of these arguments, but just walking through the process. Lastly, I am intentionally leaving out any sort of complexity or context in this based on shot types, locations, etc. If I shoot 35% from three exclusively on wide open corner threes, and another player shoots 35% from three exclusively on off the dribble top of the key pullups, it is almost certainly the case that the other player is a “better” shooter, even though we both make the same percentage of our attempts. That information is not included in these toy examples and obviously should be taken into consideration when conducting a real argument.

Note: I am glossing over important statistical concepts, assumptions, use cases, etc. I am aware that I did not get into the nuances of every possible permutations and application of the tests and data.

Fake Statistical Argument #1

Hypothesis: Robert Covington’s clear improvement from 2016-2017 to 2017-2018 is evidence that growing his hair out definitely made him a better shooter.

2016-2017: 137/412 for 33.3% (67 games)

2017-2018: 203/550 for 36.9% (80 games)

I personally like using three point shooting for these concepts because I believe that unlike two point shooting, it is almost entirely determined by your skill at shooting, and not by chances created by others or garbage points. Let’s take a look at the data set we will be using for this exercise.

Robert Covington Three Point Shooting Box Scores (2015-2018)

FG3M FG3A FG3_PCT PLUS_MINUS Season
FG3M FG3A FG3_PCT PLUS_MINUS Season
0 5 0 -25 2015-2016
0 4 0 9 2015-2016
0 5 0 -19 2015-2016
0 4 0 -10 2015-2016
3 6 0.5 9 2015-2016
3 9 0.333 5 2015-2016
2 8 0.25 -4 2015-2016
6 9 0.667 1 2015-2016
2 4 0.5 -7 2015-2016
5 11 0.455 18 2015-2016
2 7 0.286 -12 2015-2016
5 12 0.417 -4 2015-2016
2 5 0.4 -22 2015-2016
0 6 0 -3 2015-2016
3 9 0.333 -4 2015-2016
5 12 0.417 -2 2015-2016
3 11 0.273 -26 2015-2016
1 2 0.5 -25 2015-2016
1 4 0.25 -8 2015-2016
1 6 0.167 -12 2015-2016
0 4 0 -11 2015-2016
2 5 0.4 -8 2015-2016
0 3 0 -2 2015-2016
1 5 0.2 16 2015-2016
2 3 0.667 -9 2015-2016
2 4 0.5 -1 2015-2016
0 1 0 1 2015-2016
0 1 0 0 2015-2016
2 5 0.4 -13 2015-2016
2 3 0.667 -6 2015-2016
1 5 0.2 2 2015-2016
6 10 0.6 -10 2015-2016
4 10 0.4 17 2015-2016
3 7 0.429 -8 2015-2016
2 7 0.286 -2 2015-2016
6 13 0.462 -13 2015-2016
5 12 0.417 10 2015-2016
1 3 0.333 -18 2015-2016
2 7 0.286 8 2015-2016
0 4 0 -15 2015-2016
0 4 0 -19 2015-2016
2 6 0.333 9 2015-2016
0 6 0 1 2015-2016
7 11 0.636 -5 2015-2016
3 7 0.429 12 2015-2016
4 7 0.571 -10 2015-2016
2 5 0.4 3 2015-2016
2 9 0.222 -16 2015-2016
3 7 0.429 3 2015-2016
1 5 0.2 -4 2015-2016
1 7 0.143 -17 2015-2016
4 8 0.5 -13 2015-2016
4 11 0.364 -10 2015-2016
1 4 0.25 -23 2015-2016
3 8 0.375 0 2015-2016
4 6 0.667 11 2015-2016
3 10 0.3 4 2015-2016
1 9 0.111 6 2015-2016
4 7 0.571 5 2015-2016
6 14 0.429 -6 2015-2016
2 7 0.286 -1 2015-2016
3 10 0.3 -10 2015-2016
1 10 0.1 4 2015-2016
7 13 0.538 9 2015-2016
5 17 0.294 2 2015-2016
6 10 0.6 6 2015-2016
6 13 0.462 -18 2015-2016
2 4 0.5 4 2016-2017
0 5 0 -18 2016-2017
0 4 0 -6 2016-2017
2 8 0.25 -17 2016-2017
1 9 0.111 -5 2016-2017
3 4 0.75 -13 2016-2017
5 9 0.556 -3 2016-2017
2 9 0.222 12 2016-2017
3 7 0.429 -10 2016-2017
1 6 0.167 -16 2016-2017
0 5 0 -4 2016-2017
0 7 0 -17 2016-2017
1 4 0.25 4 2016-2017
1 5 0.2 8 2016-2017
4 7 0.571 0 2016-2017
0 3 0 -28 2016-2017
2 6 0.333 12 2016-2017
6 9 0.667 -25 2016-2017
4 11 0.364 11 2016-2017
3 7 0.429 0 2016-2017
1 5 0.2 12 2016-2017
2 4 0.5 13 2016-2017
3 7 0.429 -7 2016-2017
0 6 0 -13 2016-2017
1 4 0.25 -10 2016-2017
2 6 0.333 -10 2016-2017
1 6 0.167 -4 2016-2017
0 2 0 -6 2016-2017
0 7 0 -18 2016-2017
3 6 0.5 1 2016-2017
1 9 0.111 -2 2016-2017
2 3 0.667 -6 2016-2017
1 5 0.2 14 2016-2017
3 6 0.5 11 2016-2017
3 6 0.5 7 2016-2017
1 4 0.25 -10 2016-2017
2 6 0.333 15 2016-2017
1 2 0.5 2 2016-2017
5 12 0.417 6 2016-2017
3 6 0.5 3 2016-2017
2 8 0.25 13 2016-2017
1 5 0.2 16 2016-2017
0 2 0 -4 2016-2017
4 7 0.571 9 2016-2017
2 5 0.4 -20 2016-2017
2 7 0.286 -7 2016-2017
0 3 0 -13 2016-2017
5 6 0.833 11 2016-2017
3 10 0.3 5 2016-2017
4 6 0.667 -1 2016-2017
5 9 0.556 5 2016-2017
2 11 0.182 -4 2016-2017
2 7 0.286 2 2016-2017
3 7 0.429 -19 2016-2017
2 5 0.4 3 2016-2017
2 5 0.4 -25 2016-2017
3 7 0.429 -17 2016-2017
4 8 0.5 -10 2016-2017
2 5 0.4 0 2016-2017
1 4 0.25 3 2016-2017
3 6 0.5 31 2016-2017
2 7 0.286 16 2016-2017
3 9 0.333 1 2016-2017
1 3 0.333 -15 2016-2017
0 4 0 -2 2016-2017
3 9 0.333 -24 2016-2017
1 6 0.167 4 2016-2017
7 11 0.636 10 2017-2018
2 6 0.333 -11 2017-2018
1 1 1 -11 2017-2018
3 5 0.6 8 2017-2018
4 12 0.333 9 2017-2018
3 6 0.5 14 2017-2018
1 4 0.25 5 2017-2018
6 11 0.545 12 2017-2018
5 9 0.556 21 2017-2018
3 5 0.6 13 2017-2018
6 12 0.5 5 2017-2018
2 6 0.333 -27 2017-2018
5 8 0.625 10 2017-2018
2 5 0.4 16 2017-2018
5 12 0.417 5 2017-2018
2 7 0.286 16 2017-2018
2 8 0.25 28 2017-2018
2 4 0.5 3 2017-2018
0 9 0 -7 2017-2018
1 5 0.2 8 2017-2018
3 10 0.3 -8 2017-2018
6 13 0.462 8 2017-2018
2 10 0.2 -10 2017-2018
4 6 0.667 -2 2017-2018
5 7 0.714 10 2017-2018
3 15 0.2 3 2017-2018
2 4 0.5 -10 2017-2018
2 13 0.154 2 2017-2018
5 12 0.417 4 2017-2018
1 4 0.25 -4 2017-2018
1 4 0.25 -4 2017-2018
1 1 1 18 2017-2018
1 6 0.167 5 2017-2018
2 5 0.4 11 2017-2018
1 4 0.25 -2 2017-2018
3 5 0.6 22 2017-2018
1 5 0.2 -25 2017-2018
1 5 0.2 14 2017-2018
1 4 0.25 9 2017-2018
3 6 0.5 28 2017-2018
4 11 0.364 -6 2017-2018
4 7 0.571 -1 2017-2018
1 5 0.2 13 2017-2018
3 9 0.333 0 2017-2018
1 8 0.125 -9 2017-2018
1 1 1 -10 2017-2018
3 7 0.429 -1 2017-2018
3 7 0.429 7 2017-2018
2 7 0.286 18 2017-2018
0 4 0 24 2017-2018
1 6 0.167 11 2017-2018
2 4 0.5 8 2017-2018
1 8 0.125 1 2017-2018
2 7 0.286 13 2017-2018
4 9 0.444 4 2017-2018
1 5 0.2 -13 2017-2018
1 4 0.25 1 2017-2018
2 8 0.25 9 2017-2018
2 4 0.5 19 2017-2018
3 7 0.429 -10 2017-2018
5 9 0.556 2 2017-2018
0 5 0 -18 2017-2018
3 7 0.429 13 2017-2018
2 7 0.286 15 2017-2018
5 7 0.714 11 2017-2018
2 3 0.667 3 2017-2018
4 9 0.444 28 2017-2018
3 4 0.75 20 2017-2018
3 7 0.429 21 2017-2018
3 9 0.333 23 2017-2018
4 8 0.5 23 2017-2018
2 8 0.25 10 2017-2018
2 9 0.222 19 2017-2018
1 9 0.111 9 2017-2018
4 8 0.5 22 2017-2018
2 7 0.286 -5 2017-2018
2 6 0.333 4 2017-2018
3 6 0.5 23 2017-2018
1 5 0.2 8 2017-2018
1 4 0.25 17 2017-2018

What we have is just Robert Covington’s three point shooting box scores for each of his games played from 2015-2016 , 2016-2017, and 2017-2018 - nothing crazy. Each row is one game with the season, three point makes, three point attempts, three point percentage, and plus/minus. Next, the stats are aggregated by season.

Robert Covington Three Point Shooting Overall by Season

Season Avg. 3PM Avg. 3PA Total Avg. 3P% Games
Season Avg. 3PM Avg. 3PA Total Avg. 3P% Games
2015-2016 2.54 7.19 35.3% 67
2016-2017 2.05 6.15 33.3% 67
2017-2018 2.54 6.88 36.9% 80

Bad Way to Argue This Point

“He improved by 3.6%, going from below average to above average so therefore he should never cut his hair. This is all the proof I need”.

Better Way

Let’s do a little math to figure out how many more threes he would have needed to make in 2016-2017 to have an equal percentage as 2017-2018.

x/412 = 0.369 -> x = 412 * 0.369 -> x = 152

152 (adjusted) - 137 (actual) = 15

15 extra threes made/67 games = 0.22 extra made threes per game

So. The difference in percentage between the two seasons is 15 threes over the course of 67 games for a player who averaged 2.05 makes per game and 6.15 attempts per game. Is that something you are super confident making a statement about? Your views may vary.

Even Better Way

Shooting threes is a nice way to play around with this methodology because it’s just a series of binary trials. Shots either go in or they do not, resulting in a successes/total proportion (3PT%). Luckily, there is a handy statistical test (two sample test for equality of proportions) for equality of two proportions, and we can use it. Without delving too deep into a statistical primer, the results from the test indicate that we cannot disprove that the two proportions are the same. The wording there is very important.

I did a little more noodling to see how many more shots at the same percentage we would need in each year to be able to say that we would reject the idea that the two proportions are the same, and the answer is almost exactly 3x the amount of makes and attempts per season.

Adjusted 2016-2017: 411/1236 for 33.3%

Adjusted 2017-2018: 609/1650 for 36.9%

In order to say with standard levels of statistical confidence that we can reject the premise that 2016-2017 and 2017-2018 are identical in terms of 3PT%, we would need three times as much data.

Once more into the breach. What if we realized that we did the math wrong counting up the stats, and that the data was actually below?

Real 2016-2017: 137/412 for 33.3%

Fake 2017-2018: 258/550 for 46.9%

Using that same test, we are able to reject that the two seasons are identical. Now, how many fewer shots at the same percentage would be need to be unable to reject? It turns out we would need about 75% less data before we got to the point where we could no longer reject.

Adjusted Real 2016-2017: 34/103 for 33%

Adjusted Fake 2017-2018: 65/138 for 47%

So, we see that our ability to make statistical differentiation (using this specific test and in general) depends on both the size of the data and the difference in the information we are trying to measure. This is also why you only need like 5-10 games to be able to add statistical evidence to the eye test of one game that Steph Curry is a better three point shooter than Joel Embiid, only and one measurement to figure out that Joel is taller than Steph.

Fake Statistical Argument #2

Hypothesis: Dario Saric and Robert Covington are the same level of volume scorer

Let’s skip the bad examples and just go to a quick look at how to figure this out. Figure 1 is a histogram of their game point totals.

Eh, they look about the same, and the descriptive statistics indicate that Dario’s mean is 13.7 (sd = 6.8) and Roco’s mean is 12.8 (sd = 6.45). Instead of manually playing around with the data, we can just calculate the sample size needed - and it ends up being about 550 games per player to be able to reject this hypothesis using this particular methodology.

Why Does This Matter?

While this section is sure to be a little hand wavy when it comes to the nuts and bolts of statistics, the reason we care about the sample size and difference in comparison values when working with these two examples is down to two concepts - Type I and Type II errors.

We can use two more fake examples to illustrate this final concept.

Hypothesis 1: Joel Embiid is more than six feet tall

Hypothesis 2: Landry Shamet weighs more than 300 pounds

The ability of your basic statistical test to determine this mathematically is in part determined by what is known as the significance level, or the probability of committing what is known as a Type I error. This would be erroneously rejecting a true hypothesis. In our example, we set the significance level to limit the probability of our test on Hypothesis 1 returning “Actually, Joel is not more than six feet tall”. The other kind of error is a Type II, which is not rejecting a false hypothesis, which would be for Hypothesis 2, “We cannot reject the possibility that Landry weighs more than 300 pounds”.

What we want for a rigorous statistical analysis where we test an idea against another (although not for all disciplines/settings/etc.) is for our analysis to have a good balance of probability that we do not reject a true hypothesis and as well as probability that we do not reject and false hypothesis. The examples presented here are intentionally simplistic, and for complex scenarios, these calculations can be terrible, and people make entire careers out of working on problems like that.

Two final thoughts - 1) please do not start tweeting at good analytics writers about this — they already know 2) I have spent about 10 years seriously and heavily involved in collegiate athletics, I am far more “watch the game nerds” than you might think so I entirely understand the limitations of a purely numerical approach.