The beginning of the season always starts my counter for iterations on the following statements: “two games of the season is a small sample size, but [PLAYER] looks vastly [IMPROVED/WORSE] compared to last season” and ”[PLAYER] is averaging 0.5 more [STAT] and 0.75 more [STAT] per game - it’s obvious that cutting [FOOD] out of their diet made a huge difference”. It is one of my least favorite tropes, because in most (but not all) use cases, the person making the statement is using “sample size” in an entirely colloquial and arbitrary manner or vastly underrating their ability to actually discern difference. My goal with this piece is to explain a dry and sometimes confusing (although important) statistical concept using data we all understand and know.

Just as an FYI, for the rest of this article, I am m going to be presenting arguments I do not actually believe, but are simply done for statistical comparison purposes. Again, this is not advocating for (or against) any of these arguments, but just walking through the process. Lastly, I am intentionally leaving out any sort of complexity or context in this based on shot types, locations, etc. If I shoot 35% from three exclusively on wide open corner threes, and another player shoots 35% from three exclusively on off the dribble top of the key pullups, it is almost certainly the case that the other player is a “better” shooter, even though we both make the same percentage of our attempts. That information is not included in these toy examples and obviously should be taken into consideration when conducting a real argument.

Note: I am glossing over important statistical concepts, assumptions, use cases, etc. I am aware that I did not get into the nuances of every possible permutations and application of the tests and data.

**Fake Statistical Argument #1**

*Hypothesis: Robert Covington’s clear improvement from 2016-2017 to 2017-2018 is evidence that growing his hair out definitely made him a better shooter. *

2016-2017: 137/412 for 33.3% (67 games)

2017-2018: 203/550 for 36.9% (80 games)

I personally like using three point shooting for these concepts because I believe that unlike two point shooting, it is almost entirely determined by your skill at shooting, and not by chances created by others or garbage points. Let’s take a look at the data set we will be using for this exercise.

### Robert Covington Three Point Shooting Box Scores (2015-2018)

FG3M | FG3A | FG3_PCT | PLUS_MINUS | Season |
---|---|---|---|---|

FG3M | FG3A | FG3_PCT | PLUS_MINUS | Season |

0 | 5 | 0 | -25 | 2015-2016 |

0 | 4 | 0 | 9 | 2015-2016 |

0 | 5 | 0 | -19 | 2015-2016 |

0 | 4 | 0 | -10 | 2015-2016 |

3 | 6 | 0.5 | 9 | 2015-2016 |

3 | 9 | 0.333 | 5 | 2015-2016 |

2 | 8 | 0.25 | -4 | 2015-2016 |

6 | 9 | 0.667 | 1 | 2015-2016 |

2 | 4 | 0.5 | -7 | 2015-2016 |

5 | 11 | 0.455 | 18 | 2015-2016 |

2 | 7 | 0.286 | -12 | 2015-2016 |

5 | 12 | 0.417 | -4 | 2015-2016 |

2 | 5 | 0.4 | -22 | 2015-2016 |

0 | 6 | 0 | -3 | 2015-2016 |

3 | 9 | 0.333 | -4 | 2015-2016 |

5 | 12 | 0.417 | -2 | 2015-2016 |

3 | 11 | 0.273 | -26 | 2015-2016 |

1 | 2 | 0.5 | -25 | 2015-2016 |

1 | 4 | 0.25 | -8 | 2015-2016 |

1 | 6 | 0.167 | -12 | 2015-2016 |

0 | 4 | 0 | -11 | 2015-2016 |

2 | 5 | 0.4 | -8 | 2015-2016 |

0 | 3 | 0 | -2 | 2015-2016 |

1 | 5 | 0.2 | 16 | 2015-2016 |

2 | 3 | 0.667 | -9 | 2015-2016 |

2 | 4 | 0.5 | -1 | 2015-2016 |

0 | 1 | 0 | 1 | 2015-2016 |

0 | 1 | 0 | 0 | 2015-2016 |

2 | 5 | 0.4 | -13 | 2015-2016 |

2 | 3 | 0.667 | -6 | 2015-2016 |

1 | 5 | 0.2 | 2 | 2015-2016 |

6 | 10 | 0.6 | -10 | 2015-2016 |

4 | 10 | 0.4 | 17 | 2015-2016 |

3 | 7 | 0.429 | -8 | 2015-2016 |

2 | 7 | 0.286 | -2 | 2015-2016 |

6 | 13 | 0.462 | -13 | 2015-2016 |

5 | 12 | 0.417 | 10 | 2015-2016 |

1 | 3 | 0.333 | -18 | 2015-2016 |

2 | 7 | 0.286 | 8 | 2015-2016 |

0 | 4 | 0 | -15 | 2015-2016 |

0 | 4 | 0 | -19 | 2015-2016 |

2 | 6 | 0.333 | 9 | 2015-2016 |

0 | 6 | 0 | 1 | 2015-2016 |

7 | 11 | 0.636 | -5 | 2015-2016 |

3 | 7 | 0.429 | 12 | 2015-2016 |

4 | 7 | 0.571 | -10 | 2015-2016 |

2 | 5 | 0.4 | 3 | 2015-2016 |

2 | 9 | 0.222 | -16 | 2015-2016 |

3 | 7 | 0.429 | 3 | 2015-2016 |

1 | 5 | 0.2 | -4 | 2015-2016 |

1 | 7 | 0.143 | -17 | 2015-2016 |

4 | 8 | 0.5 | -13 | 2015-2016 |

4 | 11 | 0.364 | -10 | 2015-2016 |

1 | 4 | 0.25 | -23 | 2015-2016 |

3 | 8 | 0.375 | 0 | 2015-2016 |

4 | 6 | 0.667 | 11 | 2015-2016 |

3 | 10 | 0.3 | 4 | 2015-2016 |

1 | 9 | 0.111 | 6 | 2015-2016 |

4 | 7 | 0.571 | 5 | 2015-2016 |

6 | 14 | 0.429 | -6 | 2015-2016 |

2 | 7 | 0.286 | -1 | 2015-2016 |

3 | 10 | 0.3 | -10 | 2015-2016 |

1 | 10 | 0.1 | 4 | 2015-2016 |

7 | 13 | 0.538 | 9 | 2015-2016 |

5 | 17 | 0.294 | 2 | 2015-2016 |

6 | 10 | 0.6 | 6 | 2015-2016 |

6 | 13 | 0.462 | -18 | 2015-2016 |

2 | 4 | 0.5 | 4 | 2016-2017 |

0 | 5 | 0 | -18 | 2016-2017 |

0 | 4 | 0 | -6 | 2016-2017 |

2 | 8 | 0.25 | -17 | 2016-2017 |

1 | 9 | 0.111 | -5 | 2016-2017 |

3 | 4 | 0.75 | -13 | 2016-2017 |

5 | 9 | 0.556 | -3 | 2016-2017 |

2 | 9 | 0.222 | 12 | 2016-2017 |

3 | 7 | 0.429 | -10 | 2016-2017 |

1 | 6 | 0.167 | -16 | 2016-2017 |

0 | 5 | 0 | -4 | 2016-2017 |

0 | 7 | 0 | -17 | 2016-2017 |

1 | 4 | 0.25 | 4 | 2016-2017 |

1 | 5 | 0.2 | 8 | 2016-2017 |

4 | 7 | 0.571 | 0 | 2016-2017 |

0 | 3 | 0 | -28 | 2016-2017 |

2 | 6 | 0.333 | 12 | 2016-2017 |

6 | 9 | 0.667 | -25 | 2016-2017 |

4 | 11 | 0.364 | 11 | 2016-2017 |

3 | 7 | 0.429 | 0 | 2016-2017 |

1 | 5 | 0.2 | 12 | 2016-2017 |

2 | 4 | 0.5 | 13 | 2016-2017 |

3 | 7 | 0.429 | -7 | 2016-2017 |

0 | 6 | 0 | -13 | 2016-2017 |

1 | 4 | 0.25 | -10 | 2016-2017 |

2 | 6 | 0.333 | -10 | 2016-2017 |

1 | 6 | 0.167 | -4 | 2016-2017 |

0 | 2 | 0 | -6 | 2016-2017 |

0 | 7 | 0 | -18 | 2016-2017 |

3 | 6 | 0.5 | 1 | 2016-2017 |

1 | 9 | 0.111 | -2 | 2016-2017 |

2 | 3 | 0.667 | -6 | 2016-2017 |

1 | 5 | 0.2 | 14 | 2016-2017 |

3 | 6 | 0.5 | 11 | 2016-2017 |

3 | 6 | 0.5 | 7 | 2016-2017 |

1 | 4 | 0.25 | -10 | 2016-2017 |

2 | 6 | 0.333 | 15 | 2016-2017 |

1 | 2 | 0.5 | 2 | 2016-2017 |

5 | 12 | 0.417 | 6 | 2016-2017 |

3 | 6 | 0.5 | 3 | 2016-2017 |

2 | 8 | 0.25 | 13 | 2016-2017 |

1 | 5 | 0.2 | 16 | 2016-2017 |

0 | 2 | 0 | -4 | 2016-2017 |

4 | 7 | 0.571 | 9 | 2016-2017 |

2 | 5 | 0.4 | -20 | 2016-2017 |

2 | 7 | 0.286 | -7 | 2016-2017 |

0 | 3 | 0 | -13 | 2016-2017 |

5 | 6 | 0.833 | 11 | 2016-2017 |

3 | 10 | 0.3 | 5 | 2016-2017 |

4 | 6 | 0.667 | -1 | 2016-2017 |

5 | 9 | 0.556 | 5 | 2016-2017 |

2 | 11 | 0.182 | -4 | 2016-2017 |

2 | 7 | 0.286 | 2 | 2016-2017 |

3 | 7 | 0.429 | -19 | 2016-2017 |

2 | 5 | 0.4 | 3 | 2016-2017 |

2 | 5 | 0.4 | -25 | 2016-2017 |

3 | 7 | 0.429 | -17 | 2016-2017 |

4 | 8 | 0.5 | -10 | 2016-2017 |

2 | 5 | 0.4 | 0 | 2016-2017 |

1 | 4 | 0.25 | 3 | 2016-2017 |

3 | 6 | 0.5 | 31 | 2016-2017 |

2 | 7 | 0.286 | 16 | 2016-2017 |

3 | 9 | 0.333 | 1 | 2016-2017 |

1 | 3 | 0.333 | -15 | 2016-2017 |

0 | 4 | 0 | -2 | 2016-2017 |

3 | 9 | 0.333 | -24 | 2016-2017 |

1 | 6 | 0.167 | 4 | 2016-2017 |

7 | 11 | 0.636 | 10 | 2017-2018 |

2 | 6 | 0.333 | -11 | 2017-2018 |

1 | 1 | 1 | -11 | 2017-2018 |

3 | 5 | 0.6 | 8 | 2017-2018 |

4 | 12 | 0.333 | 9 | 2017-2018 |

3 | 6 | 0.5 | 14 | 2017-2018 |

1 | 4 | 0.25 | 5 | 2017-2018 |

6 | 11 | 0.545 | 12 | 2017-2018 |

5 | 9 | 0.556 | 21 | 2017-2018 |

3 | 5 | 0.6 | 13 | 2017-2018 |

6 | 12 | 0.5 | 5 | 2017-2018 |

2 | 6 | 0.333 | -27 | 2017-2018 |

5 | 8 | 0.625 | 10 | 2017-2018 |

2 | 5 | 0.4 | 16 | 2017-2018 |

5 | 12 | 0.417 | 5 | 2017-2018 |

2 | 7 | 0.286 | 16 | 2017-2018 |

2 | 8 | 0.25 | 28 | 2017-2018 |

2 | 4 | 0.5 | 3 | 2017-2018 |

0 | 9 | 0 | -7 | 2017-2018 |

1 | 5 | 0.2 | 8 | 2017-2018 |

3 | 10 | 0.3 | -8 | 2017-2018 |

6 | 13 | 0.462 | 8 | 2017-2018 |

2 | 10 | 0.2 | -10 | 2017-2018 |

4 | 6 | 0.667 | -2 | 2017-2018 |

5 | 7 | 0.714 | 10 | 2017-2018 |

3 | 15 | 0.2 | 3 | 2017-2018 |

2 | 4 | 0.5 | -10 | 2017-2018 |

2 | 13 | 0.154 | 2 | 2017-2018 |

5 | 12 | 0.417 | 4 | 2017-2018 |

1 | 4 | 0.25 | -4 | 2017-2018 |

1 | 4 | 0.25 | -4 | 2017-2018 |

1 | 1 | 1 | 18 | 2017-2018 |

1 | 6 | 0.167 | 5 | 2017-2018 |

2 | 5 | 0.4 | 11 | 2017-2018 |

1 | 4 | 0.25 | -2 | 2017-2018 |

3 | 5 | 0.6 | 22 | 2017-2018 |

1 | 5 | 0.2 | -25 | 2017-2018 |

1 | 5 | 0.2 | 14 | 2017-2018 |

1 | 4 | 0.25 | 9 | 2017-2018 |

3 | 6 | 0.5 | 28 | 2017-2018 |

4 | 11 | 0.364 | -6 | 2017-2018 |

4 | 7 | 0.571 | -1 | 2017-2018 |

1 | 5 | 0.2 | 13 | 2017-2018 |

3 | 9 | 0.333 | 0 | 2017-2018 |

1 | 8 | 0.125 | -9 | 2017-2018 |

1 | 1 | 1 | -10 | 2017-2018 |

3 | 7 | 0.429 | -1 | 2017-2018 |

3 | 7 | 0.429 | 7 | 2017-2018 |

2 | 7 | 0.286 | 18 | 2017-2018 |

0 | 4 | 0 | 24 | 2017-2018 |

1 | 6 | 0.167 | 11 | 2017-2018 |

2 | 4 | 0.5 | 8 | 2017-2018 |

1 | 8 | 0.125 | 1 | 2017-2018 |

2 | 7 | 0.286 | 13 | 2017-2018 |

4 | 9 | 0.444 | 4 | 2017-2018 |

1 | 5 | 0.2 | -13 | 2017-2018 |

1 | 4 | 0.25 | 1 | 2017-2018 |

2 | 8 | 0.25 | 9 | 2017-2018 |

2 | 4 | 0.5 | 19 | 2017-2018 |

3 | 7 | 0.429 | -10 | 2017-2018 |

5 | 9 | 0.556 | 2 | 2017-2018 |

0 | 5 | 0 | -18 | 2017-2018 |

3 | 7 | 0.429 | 13 | 2017-2018 |

2 | 7 | 0.286 | 15 | 2017-2018 |

5 | 7 | 0.714 | 11 | 2017-2018 |

2 | 3 | 0.667 | 3 | 2017-2018 |

4 | 9 | 0.444 | 28 | 2017-2018 |

3 | 4 | 0.75 | 20 | 2017-2018 |

3 | 7 | 0.429 | 21 | 2017-2018 |

3 | 9 | 0.333 | 23 | 2017-2018 |

4 | 8 | 0.5 | 23 | 2017-2018 |

2 | 8 | 0.25 | 10 | 2017-2018 |

2 | 9 | 0.222 | 19 | 2017-2018 |

1 | 9 | 0.111 | 9 | 2017-2018 |

4 | 8 | 0.5 | 22 | 2017-2018 |

2 | 7 | 0.286 | -5 | 2017-2018 |

2 | 6 | 0.333 | 4 | 2017-2018 |

3 | 6 | 0.5 | 23 | 2017-2018 |

1 | 5 | 0.2 | 8 | 2017-2018 |

1 | 4 | 0.25 | 17 | 2017-2018 |

What we have is just Robert Covington’s three point shooting box scores for each of his games played from 2015-2016 , 2016-2017, and 2017-2018 - nothing crazy. Each row is one game with the season, three point makes, three point attempts, three point percentage, and plus/minus. Next, the stats are aggregated by season.

### Robert Covington Three Point Shooting Overall by Season

Season | Avg. 3PM | Avg. 3PA | Total Avg. 3P% | Games |
---|---|---|---|---|

Season | Avg. 3PM | Avg. 3PA | Total Avg. 3P% | Games |

2015-2016 | 2.54 | 7.19 | 35.3% | 67 |

2016-2017 | 2.05 | 6.15 | 33.3% | 67 |

2017-2018 | 2.54 | 6.88 | 36.9% | 80 |

**Bad Way to Argue This Point**

“He improved by 3.6%, going from below average to above average so therefore he should never cut his hair. This is all the proof I need”.

**Better Way**

Let’s do a little math to figure out how many more threes he would have needed to make in 2016-2017 to have an equal percentage as 2017-2018.

x/412 = 0.369 -> x = 412 * 0.369 -> x = 152

152 (adjusted) - 137 (actual) = 15

15 extra threes made/67 games = 0.22 extra made threes per game

So. The difference in percentage between the two seasons is 15 threes over the course of 67 games for a player who averaged 2.05 makes per game and 6.15 attempts per game. Is that something you are super confident making a statement about? Your views may vary.

**Even Better Way**

Shooting threes is a nice way to play around with this methodology because it’s just a series of binary trials. Shots either go in or they do not, resulting in a successes/total proportion (3PT%). Luckily, there is a handy statistical test (two sample test for equality of proportions) for equality of two proportions, and we can use it. Without delving too deep into a statistical primer, the results from the test indicate that we cannot disprove that the two proportions are the same. The wording there is very important.

I did a little more noodling to see how many more shots **at the same percentage** we would need in each year to be able to say that we would reject the idea that the two proportions are the same, and the answer is almost exactly 3x the amount of makes and attempts per season.

Adjusted 2016-2017: 411/1236 for 33.3%

Adjusted 2017-2018: 609/1650 for 36.9%

In order to say with standard levels of statistical confidence that we can reject the premise that 2016-2017 and 2017-2018 are identical in terms of 3PT%, we would need three times as much data.

Once more into the breach. What if we realized that we did the math wrong counting up the stats, and that the data was actually below?

Real 2016-2017: 137/412 for 33.3%

Fake 2017-2018: 258/550 for 46.9%

Using that same test, we are able to reject that the two seasons are identical. Now, how many fewer shots **at the same percentage** would be need to be unable to reject? It turns out we would need about 75% less data before we got to the point where we could no longer reject.

Adjusted Real 2016-2017: 34/103 for 33%

Adjusted Fake 2017-2018: 65/138 for 47%

So, we see that our ability to make statistical differentiation (using this specific test and in general) depends on both the size of the data and the difference in the information we are trying to measure. This is also why you only need like 5-10 games to be able to add statistical evidence to the eye test of one game that Steph Curry is a better three point shooter than Joel Embiid, only and one measurement to figure out that Joel is taller than Steph.

**Fake Statistical Argument #2**

*Hypothesis: Dario Saric and Robert Covington are the same level of volume scorer*

Let’s skip the bad examples and just go to a quick look at how to figure this out. Figure 1 is a histogram of their game point totals.

Eh, they look about the same, and the descriptive statistics indicate that Dario’s mean is 13.7 (sd = 6.8) and Roco’s mean is 12.8 (sd = 6.45). Instead of manually playing around with the data, we can just calculate the sample size needed - and it ends up being about 550 games **per player** to be able to reject this hypothesis using this particular methodology.

**Why Does This Matter?**

While this section is sure to be a little hand wavy when it comes to the nuts and bolts of statistics, the reason we care about the sample size and difference in comparison values when working with these two examples is down to two concepts - Type I and Type II errors.

We can use two more fake examples to illustrate this final concept.

*Hypothesis 1: Joel Embiid is more than six feet tall*

*Hypothesis 2: Landry Shamet weighs more than 300 pounds*

The ability of your basic statistical test to determine this mathematically is in part determined by what is known as the significance level, or the probability of committing what is known as a Type I error. This would be erroneously rejecting a true hypothesis. In our example, we set the significance level to limit the probability of our test on Hypothesis 1 returning “Actually, Joel is not more than six feet tall”. The other kind of error is a Type II, which is not rejecting a false hypothesis, which would be for Hypothesis 2, “We cannot reject the possibility that Landry weighs more than 300 pounds”.

What we want for a rigorous statistical analysis where we test an idea against another (although not for all disciplines/settings/etc.) is for our analysis to have a good balance of probability that we do not reject a true hypothesis and as well as probability that we do not reject and false hypothesis. The examples presented here are intentionally simplistic, and for complex scenarios, these calculations can be terrible, and people make entire careers out of working on problems like that.

Two final thoughts - 1) please do not start tweeting at good analytics writers about this — they already know 2) I have spent about 10 years seriously and heavily involved in collegiate athletics, I am far more “watch the game nerds” than you might think so I entirely understand the limitations of a purely numerical approach.