# Root Mean Squared Error

## Overview

One of the more standard measures of model accuracy when predicting numeric values is the *Root Mean Squared Error*.

Basically, for every predicted value, you:

- Find the difference between your prediction and the actual result
- Square each value
- Add each value together
- Take the square root of that
- Divide by the number of observations

This allows us to get an absolute-value measure of how far off from correct each prediction was, over or under.

Additionally we take the root (as opposed to just MSE), in order to express the error in interpretable units.

## Fitting a Model

```
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
```

```
# dummy dataset
X, y = make_regression()
X.shape, y.shape
```

```
((100, 100), (100,))
```

We want to build a simple Linear Regression model with our dummy data.

But as we’ve discussed in other notebooks, we first need to split our data up into training and test sets, so let’s do that.

```
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y)
[arr.shape for arr in train_test_split(X, y)]
```

```
[(75, 100), (25, 100), (75,), (25,)]
```

Now we can fit our model with our training data

```
model = LinearRegression()
model.fit(train_X, train_y)
```

```
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
```

And use that model to make a prediction on our test data

`model.predict(test_X)`

```
array([ 157.19999884, 182.84477034, 198.61054437, 168.05031037,
102.80607835, -161.14849689, -45.37499645, 77.02471828,
174.79940023, 70.73630468, 96.67254953, -22.69534224,
-251.23474593, 191.22108821, -302.14564522, -149.39232913,
167.25523265, 212.15791823, 251.71364073, -90.09065502,
-16.90454986, -21.64715521, 94.58179599, -292.43390204,
131.28127778])
```

## Scoring Accuracy

If we want to see how close we were, we compare against `test_y`

and follow the same steps above.

```
import numpy as np
predictions = model.predict(test_X)
```

```
error = predictions - test_y
mse = np.sum(error * error) / len(error)
rmse = np.sqrt(mse)
rmse
```

```
59.635789214540765
```

Or we just use the `sklearn`

implementation

```
from sklearn.metrics import mean_squared_error
mse = mean_squared_error(test_y, predictions)
rmse = np.sqrt(mse)
rmse
```

```
59.635789214540765
```