Grid Search

01 Jun 2018

Once you’ve got the modeling basics down, you should have a reasonable grasp on what tool to use in what instance.

But after that step, the difference between a good model and a great model lies in the way you implement that solution. How many splits can your Decision Tree do? How do we normalize our Linear Regression (if at all!)?

To answer these types of questions, we might turn to the GridSearchCV object in sklearn.

Basic Model

Let’s use the Boston Housing dataset

from sklearn.datasets import load_boston
import numpy as np

data = load_boston()

X = data['data']
y = data['target']

And fit a simple Decision Tree to it.

from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split

train_X, test_X, train_y, test_y = train_test_split(X, y)

model = DecisionTreeRegressor()

model.fit(train_X, train_y)

DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

Scoring the accuracy with Root Mean Squared Error

from sklearn.metrics import mean_squared_error

np.sqrt(mean_squared_error(model.predict(test_X), test_y))

5.91187248005845

Pretty good!

But Could we be Better?

How many different params could we have called DecisionTreeRegressor with?

Inspecting the class header yields a lot of optional parameters.

DecisionTreeRegressor(criterion='mse', splitter='best', max_depth=None, min_samples_split=2, min_samples_leaf=1, min_weight_fraction_leaf=0.0, max_features=None, random_state=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_impurity_split=None, presort=False)

By my count, there are 12 different parameters we could give diferent attributes to.

I’m no mathematician, but I think if each of these had 3 possible options, you’d have a ton of different possible combinations of inputs. We might naively accomplish this via some sort of for loop mayhem.

Or we could use the GridSearchCV object.

from sklearn.model_selection import GridSearchCV

param_grid = [
    {'max_depth': [3, 5, 10],
     'max_features': [3, 4, 5]}
]

model = DecisionTreeRegressor()
grid_search = GridSearchCV(model, param_grid, cv=5,
                           scoring='neg_mean_squared_error')
grid_search.fit(X, y)

GridSearchCV(cv=5, error_score='raise',
       estimator=DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best'),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='neg_mean_squared_error', verbose=0)

Running this will affix a cv_results attribute to our GridSearchCV object that houses the error/params combinations of each of our runs.

results = grid_search.cv_results_
for mean_score, params in zip(results['mean_test_score'], results['params']):
    print(np.sqrt(-mean_score), params)

5.820944521035298 {'max_depth': 3, 'max_features': 3}
7.059099610542761 {'max_depth': 3, 'max_features': 4}
6.1065003798330375 {'max_depth': 3, 'max_features': 5}
6.638482778664762 {'max_depth': 5, 'max_features': 3}
7.1317520786530295 {'max_depth': 5, 'max_features': 4}
5.906653954960339 {'max_depth': 5, 'max_features': 5}
7.6379616368530305 {'max_depth': 10, 'max_features': 3}
6.244301426278941 {'max_depth': 10, 'max_features': 4}
5.874556989282094 {'max_depth': 10, 'max_features': 5}

Alternatively, we can just look at what parameters worked best.

grid_search.best_params_

{'max_depth': 3, 'max_features': 3}

Or just return the object that contained the best params.

grid_search.best_estimator_

DecisionTreeRegressor(criterion='mse', max_depth=3, max_features=3,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best')

More Complicated Grid Searching

Notice how param_grid was actually a list of dictionaries.

We can pass multiple dicts and as long as they’re valid features for our model, it will go through all of the combinatorics for you all the same.

from sklearn.model_selection import GridSearchCV

param_grid = [
    {'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]},
    {'random_state': [0, 1, 2, 3, 4], 'min_samples_split': [2, 3, 4]}    
]

model = DecisionTreeRegressor()
grid_search = GridSearchCV(model, param_grid, cv=5,
                           scoring='neg_mean_squared_error')
grid_search.fit(X, y)

GridSearchCV(cv=5, error_score='raise',
       estimator=DecisionTreeRegressor(criterion='mse', max_depth=None, max_features=None,
           max_leaf_nodes=None, min_impurity_decrease=0.0,
           min_impurity_split=None, min_samples_leaf=1,
           min_samples_split=2, min_weight_fraction_leaf=0.0,
           presort=False, random_state=None, splitter='best'),
       fit_params=None, iid=True, n_jobs=1,
       param_grid=[{'max_depth': [3, 5, 10], 'max_features': [3, 4, 5]}, {'random_state': [0, 1, 2, 3, 4], 'min_samples_split': [2, 3, 4]}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score='warn',
       scoring='neg_mean_squared_error', verbose=0)

results = grid_search.cv_results_
for mean_score, params in zip(results['mean_test_score'], results['params']):
    print(np.sqrt(-mean_score), params)

6.88084255884404 {'max_depth': 3, 'max_features': 3}
7.728310368407599 {'max_depth': 3, 'max_features': 4}
5.631728384408785 {'max_depth': 3, 'max_features': 5}
7.077161055401675 {'max_depth': 5, 'max_features': 3}
6.434248762749995 {'max_depth': 5, 'max_features': 4}
5.50623176204322 {'max_depth': 5, 'max_features': 5}
9.7960528608141 {'max_depth': 10, 'max_features': 3}
6.433002196722848 {'max_depth': 10, 'max_features': 4}
6.867305774249494 {'max_depth': 10, 'max_features': 5}
6.443989447539467 {'min_samples_split': 2, 'random_state': 0}
6.175426246670204 {'min_samples_split': 2, 'random_state': 1}
6.221729739278025 {'min_samples_split': 2, 'random_state': 2}
6.584041672337973 {'min_samples_split': 2, 'random_state': 3}
6.26683188047851 {'min_samples_split': 2, 'random_state': 4}
6.424937519881992 {'min_samples_split': 3, 'random_state': 0}
6.10162414392069 {'min_samples_split': 3, 'random_state': 1}
6.117777664913318 {'min_samples_split': 3, 'random_state': 2}
6.349692656967317 {'min_samples_split': 3, 'random_state': 3}
6.458894807502856 {'min_samples_split': 3, 'random_state': 4}
6.3317342045359615 {'min_samples_split': 4, 'random_state': 0}
6.329991447195127 {'min_samples_split': 4, 'random_state': 1}
6.304369707153886 {'min_samples_split': 4, 'random_state': 2}
6.206741512775601 {'min_samples_split': 4, 'random_state': 3}
6.2498656112033935 {'min_samples_split': 4, 'random_state': 4}