One of the more standard measures of model accuracy when predicting numeric values is the Root Mean Squared Error.
Basically, for every predicted value, you:
- Find the difference between your prediction and the actual result
- Square each value
- Add each value together
- Take the square root of that
- Divide by the number of observations
This allows us to get an absolute-value measure of how far off from correct each prediction was, over or under.
Additionally we take the root (as opposed to just MSE), in order to express the error in interpretable units.
Fitting a Model
from sklearn.datasets import make_regression from sklearn.linear_model import LinearRegression
# dummy dataset X, y = make_regression() X.shape, y.shape
((100, 100), (100,))
We want to build a simple Linear Regression model with our dummy data.
But as we’ve discussed in other notebooks, we first need to split our data up into training and test sets, so let’s do that.
from sklearn.model_selection import train_test_split train_X, test_X, train_y, test_y = train_test_split(X, y) [arr.shape for arr in train_test_split(X, y)]
[(75, 100), (25, 100), (75,), (25,)]
Now we can fit our model with our training data
model = LinearRegression() model.fit(train_X, train_y)
LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
And use that model to make a prediction on our test data
array([ 157.19999884, 182.84477034, 198.61054437, 168.05031037, 102.80607835, -161.14849689, -45.37499645, 77.02471828, 174.79940023, 70.73630468, 96.67254953, -22.69534224, -251.23474593, 191.22108821, -302.14564522, -149.39232913, 167.25523265, 212.15791823, 251.71364073, -90.09065502, -16.90454986, -21.64715521, 94.58179599, -292.43390204, 131.28127778])
If we want to see how close we were, we compare against
test_y and follow the same steps above.
import numpy as np predictions = model.predict(test_X)
error = predictions - test_y mse = np.sum(error * error) / len(error) rmse = np.sqrt(mse) rmse
Or we just use the
from sklearn.metrics import mean_squared_error mse = mean_squared_error(test_y, predictions) rmse = np.sqrt(mse) rmse