As described in the sampling distributions notebook, the larger your sample size, the less variability in your sampling distribution.
Thus, in order to make statistically-sound assertions, we have to collect a sufficient amount of data to be able to generalize results from our sample to our population.
Thankfully, there are some easy heuristics we can follow to ensure we’ve gathered enough data, and correctly.
Sample needs to be selected randomly.
We don’t want our sample account for too much of the overall population.
$$N \geq 10n$$
We want to have enough data that we can say our sample looks approximately normal.
Let $p$ be the probability of success
$$p * n \geq 10$$
$$(1-p) * n \geq 10$$
Easier benchmark, we just want
$$ n \ge 30$$