# Confidence Intervals

### Confidence Interval vs Level

If you’ve got a bunch of data and you’re trying to approximate the average value, you’re going to have to bake in some wiggle room for your prediction.

Assuming that you’ve already cleared our usual sampling conditions, we want essentially want to come up with an expression of

$ourEstimate \pm marginOfError$

How we calculate this “marginOfError” depends on whether we’re looking at a sample mean or proportion (more below), but is basically the product of two parts:

- The
**standard error**, an approximation of the Standard Deviation - Our
**test statistic**, which is a function of what distribution we’re using (mean vs proportion) and our**confidence level**

#### Confidence Level

The confidence level is expressed as a percentage and essentially says

In an infinite number of trials, X% of these error bands would contain our

populationmean

#### Confidence Interval

By extension of that, these “error bands” are your confidence interval.

### Proportion

Calculating the confidence interval for a sample proportion follows the equation

$ \hat{p} \pm Z^* \sqrt\frac{\hat{p}(1-\hat{p})}{n}$

Arriving at a `Z_star`

value for a given confidence level has traditionally involved using a lookup table much more robust than below

```
CL Z*
==========
80% 1.28
90% 1.645
95% 1.96
98% 2.33
99% 2.58
```

### Mean

Similarly, calculating the confidence interval for a sample mean looks like the following

$ \bar{X} \pm t^* \frac{s_x}{\sqrt{n}}$

The lookup table for `t_star`

is a bit more complicated than it’s `Z`

counterpart and also involves another dimension for *degrees of freedom*.

### In Python

So how do we calculate these in Python?

```
import numpy as np
from scipy import stats
```

#### Proportion

1000 values of either 0 or 1

`X = np.random.randint(0, 2, (1000))`

```
p_hat = X.mean()
std_err = np.sqrt((p_hat * (1 - p_hat)) / len(X))
```

```
print(p_hat - 1.96 * std_err,
p_hat + 1.96 * std_err)
```

```
0.477013645945 0.538986354055
```

`scipy`

has a **ROBUST** `stats`

api with a ton of functionality, including calculating confidence intervals

`stats.norm.interval(0.95, loc=np.mean(X), scale=stats.sem(X))`

```
(0.47699871080518186, 0.53900128919481816)
```

But I found that for my purposes, I was seldom reaching for the customization they provide. So instead, I threw together stats101 as a thin wrapper with more of the arguments handled automagically.

Now we just apply the confidence level and our data

```
from stats101 import confidence_interval
confidence_interval.proportion(0.90, X)
```

```
(0.48198289796420185, 0.53401710203579811)
```

`confidence_interval.proportion(0.95, X)`

```
(0.47699871080518186, 0.53900128919481816)
```

`confidence_interval.proportion(0.99, X)`

```
(0.46725739973505387, 0.54874260026494615)
```

#### Mean

Works just as easy for a sample mean

`X = np.random.randint(0, 100, (1000))`

`confidence_interval.mean(0.8, X)`

```
(47.149856666859471, 49.460143333140529)
```

`confidence_interval.mean(0.95, X)`

```
(46.537387386009932, 50.072612613990067)
```

`confidence_interval.mean(0.99, X)`

```
(45.980336643055075, 50.629663356944924)
```