Grid Search
Once you’ve got the modeling basics down, you should have a reasonable grasp on what tool to use in what instance.
But after that step, the difference between a good model and a great model lies in the way you implement that solution. How many splits can your Decision Tree do? How do we normalize our Linear Regression (if at all!)?
To answer these types of questions, we might turn to the GridSearchCV
object in sklearn
.
Basic Model
Let’s use the Boston Housing dataset
from sklearn.datasets import load_boston
import numpy as np
data = load_boston()
X = data['data']
y = data['target']
And fit a simple Decision Tree to it.
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
train_X, test_X, train_y, test_y = train_test_split(X, y)
model = DecisionTreeRegressor()
model.fit(train_X, train_y)
DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
Scoring the accuracy with Root Mean Squared Error
from sklearn.metrics import mean_squared_error
np.sqrt(mean_squared_error(model.predict(test_X), test_y))
5.91187248005845
Pretty good!
But Could we be Better?
How many different params could we have called DecisionTreeRegressor
with?
Inspecting the class header yields a lot of optional parameters.
DecisionTreeRegressor(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, presort=False)
By my count, there are 12 different parameters we could give diferent attributes to.
I’m no mathematician, but I think if each of these had 3 possible options, you’d have a ton of different possible combinations of inputs. We might naively accomplish this via some sort of for loop mayhem.
Or we could use the GridSearchCV
object.
from sklearn.model_selection import GridSearchCV
param_grid = [
{'max_depth': [3, 5, 10],
'max_features': [3, 4, 5]}
]
model = DecisionTreeRegressor()
grid_search = GridSearchCV(model, param_grid, cv=5,
scoring='neg_mean_squared_error')
grid_search.fit(X, y)
GridSearchCV(cv=5, error_score='raise',
estimator=DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best'),
fit_params=None, iid=True, n_jobs=1,
param_grid=[{'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]}],
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring='neg_mean_squared_error', verbose=0)
Running this will affix a cv_results
attribute to our GridSearchCV
object that houses the error/params combinations of each of our runs.
results = grid_search.cv_results_
for mean_score, params in zip(results['mean_test_score'], results['params']):
print(np.sqrt(-mean_score), params)
5.820944521035298 {'max_depth': 3, 'max_features': 3}
7.059099610542761 {'max_depth': 3, 'max_features': 4}
6.1065003798330375 {'max_depth': 3, 'max_features': 5}
6.638482778664762 {'max_depth': 5, 'max_features': 3}
7.1317520786530295 {'max_depth': 5, 'max_features': 4}
5.906653954960339 {'max_depth': 5, 'max_features': 5}
7.6379616368530305 {'max_depth': 10, 'max_features': 3}
6.244301426278941 {'max_depth': 10, 'max_features': 4}
5.874556989282094 {'max_depth': 10, 'max_features': 5}
Alternatively, we can just look at what parameters worked best.
grid_search.best_params_
{'max_depth': 3, 'max_features': 3}
Or just return the object that contained the best params.
grid_search.best_estimator_
DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=3,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best')
More Complicated Grid Searching
Notice how param_grid
was actually a list of dictionaries.
We can pass multiple dicts and as long as they’re valid features for our model, it will go through all of the combinatorics for you all the same.
from sklearn.model_selection import GridSearchCV
param_grid = [
{'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]},
{'random_state': [0, 1, 2, 3, 4], 'min_samples_split': [2, 3, 4]}
]
model = DecisionTreeRegressor()
grid_search = GridSearchCV(model, param_grid, cv=5,
scoring='neg_mean_squared_error')
grid_search.fit(X, y)
GridSearchCV(cv=5, error_score='raise',
estimator=DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
max_leaf_nodes=None, min_impurity_decrease=0.0,
min_impurity_split=None, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=None, splitter='best'),
fit_params=None, iid=True, n_jobs=1,
param_grid=[{'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]}, {'random_state': [0, 1, 2, 3, 4], 'min_samples_split': [2, 3, 4]}],
pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
scoring='neg_mean_squared_error', verbose=0)
results = grid_search.cv_results_
for mean_score, params in zip(results['mean_test_score'], results['params']):
print(np.sqrt(-mean_score), params)
6.88084255884404 {'max_depth': 3, 'max_features': 3}
7.728310368407599 {'max_depth': 3, 'max_features': 4}
5.631728384408785 {'max_depth': 3, 'max_features': 5}
7.077161055401675 {'max_depth': 5, 'max_features': 3}
6.434248762749995 {'max_depth': 5, 'max_features': 4}
5.50623176204322 {'max_depth': 5, 'max_features': 5}
9.7960528608141 {'max_depth': 10, 'max_features': 3}
6.433002196722848 {'max_depth': 10, 'max_features': 4}
6.867305774249494 {'max_depth': 10, 'max_features': 5}
6.443989447539467 {'min_samples_split': 2, 'random_state': 0}
6.175426246670204 {'min_samples_split': 2, 'random_state': 1}
6.221729739278025 {'min_samples_split': 2, 'random_state': 2}
6.584041672337973 {'min_samples_split': 2, 'random_state': 3}
6.26683188047851 {'min_samples_split': 2, 'random_state': 4}
6.424937519881992 {'min_samples_split': 3, 'random_state': 0}
6.10162414392069 {'min_samples_split': 3, 'random_state': 1}
6.117777664913318 {'min_samples_split': 3, 'random_state': 2}
6.349692656967317 {'min_samples_split': 3, 'random_state': 3}
6.458894807502856 {'min_samples_split': 3, 'random_state': 4}
6.3317342045359615 {'min_samples_split': 4, 'random_state': 0}
6.329991447195127 {'min_samples_split': 4, 'random_state': 1}
6.304369707153886 {'min_samples_split': 4, 'random_state': 2}
6.206741512775601 {'min_samples_split': 4, 'random_state': 3}
6.2498656112033935 {'min_samples_split': 4, 'random_state': 4}