# R Squared Measure of Fit

## Components

Suppose we’ve got a spattering of points that look like the following

```
from IPython.display import Image
Image('images/r2_points.PNG')
```

We’d say that the *Total Sum of Squares* represents the combined amount of variation in Y across all points. In this case, we measure and sum the distance of each point (maroon) from the average Y value (red).

We square each value when we sum so values above and below the average height don’t cancel out.

`Image('images/r2_tss.PNG')`

Then consider fitting a Linear Regression to these points and plotting the proposed form, $\hat{y}$, in blue

We’d then say that the *Residual Sum of Squares* represents the total distance from each point to the line of best fit– again, squared and summed.

`Image('images/r2_rss.PNG')`

## $R^2$ Statistic

Simply put, the $R^2$ statistic of a Linear Regression measures *the proportion of variance of Y explained by variance in X*. In other words, “what percentage of all variation in the target can we model using our attributes?”

To arrive at this, we want to quantify the amount of variation *not* explained by X– which is simply the total residual error as a fraction of all variation

$\text{Unexplained error} = \frac{RSS}{TSS}$

If “the proprotion of all error” adds up to 100 percent, then finding the R-squared value simply means subtracting the above from 1.

$R^2 = 1 - \frac{RSS}{TSS}$

This gives us “the percentage of variation in Y that we can explain using X”

### Extending

To solidify this intution, if we can perfectly model our data points with a straight line, as below, then we’d say that *100% of the variation in Y can be explained by variation in X*– or that $R^2 = 1$

`Image('images/r2_perfect.PNG')`

Conversely, if there is NO relationship between X and Y, and our “line of best fit” winds up being the same as the constant average value for Y, then R-squared is 0, as RSS = TSS, given that the two lines are identical.

`Image('images/r2_imperfect.PNG')`

### Adjusted R-Squared

Finally, it’s worth examining the effect of adding variables to your model, with respect to R-Squared.

First consider that R-Squared is a measure of explained variance. If you were to add a variable to your model that was outright useless in prediction, it would come out in the wash of your model training. Thus, *adding more variables to your model will only ever improve your R-squared*.

But this obviously isn’t a desirable trait of the score, otherwise we’d litter all of our model with a host of unimportant features.

The “Adjusted R-Squared Score” corrects for this by normalizing the calculation for feature count.

$AdjR^2 = 1 - \frac{\frac{RSS}{n - d - 1}}{\frac{TSS}{n-1}}$

Where `d`

is the number of different features. To paraphrase ISL

Once all of the

correctvariables have been added, adding noise variables will lead to only a very small decrease in RSS. However, incrementing`d`

in the denominator of the numerator will outweigh this effect. Thus adjusted R-squared “pays a price” for the inclusion of unnecessary variables in the model.