01 Aug

Randomized Grid Search

Manual hyperparameter searching? No way. Scikit Learn has a got an amazing random grid search function that can give us a hint into the best parameters by calling its class, setting up a dictionary with all parameters, and letting it fly. This example below his using a K-Nearest Neighbours model for its example. After the Randomize Grid Search is done, you can pull the best parameter for your model, and as well as take a look a the history of the previous combination of parameters.

Import Preliminaries

In [6]:
# Import modulse
import numpy as np
import pandas as pd
from sklearn.cross_validation import cross_val_score
from sklearn.datasets import load_iris
from sklearn.model_selection import RandomizedSearchCV
from sklearn.neighbors import KNeighborsClassifier


# Import iris dataset 
iris = load_iris()
X, y = iris.data, iris.target

# Assign classifier
classifier = KNeighborsClassifier(n_neighbors=5, weights='uniform', 
                                 metric ='minkowski', p=2)

# Intiate a grid dictionary
grid = {'n_neighbors':list(range(1,11)), 'weights':['uniform', 'distance'],
       'p':[1,2], }

# Declare randomized search on model using our param grid
random_search = RandomizedSearchCV(estimator=classifier, 
                                   param_distributions = grid, 
                                   n_iter = 10, scoring = 'accuracy', 
                                   n_jobs=1, refit=True,
                                   cv = 10,
                                   return_train_score=True)

# Fit the randomized search model with our data
random_search.fit(X,y)
Out[6]:
RandomizedSearchCV(cv=10, error_score='raise',
          estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
          fit_params=None, iid=True, n_iter=10, n_jobs=1,
          param_distributions={'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 'weights': ['uniform', 'distance'], 'p': [1, 2]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score=True, scoring='accuracy', verbose=0)
In [7]:
# Print the best parameters and its best accuracy score
print('Best parameters: %s'%random_search.best_params_)
print('CV Accuracy of best parameters: %.3f'%random_search.best_score_)
Best parameters: {'weights': 'distance', 'p': 2, 'n_neighbors': 10}
CV Accuracy of best parameters: 0.973
  • This method is more computationaly visable then a full grid search
  • The result will change each time the model is fitted

Baseline Cross Validation Score

In [8]:
# Print our current accuracy score using our current parameters
print ('Baseline with default parameters: %.3f' %np.mean(
        cross_val_score(classifier, X, y, cv=10, scoring='accuracy', n_jobs=1)))
Baseline with default parameters: 0.967

Viewing Randomized Grid Score

In [9]:
# The grid scores attribute is now depricated, 
# but I'll use it till its completely gone
random_search.grid_scores_
/Users/kavi/anaconda3/envs/main/lib/python3.6/site-packages/sklearn/model_selection/_search.py:747: DeprecationWarning: The grid_scores_ attribute was deprecated in version 0.18 in favor of the more elaborate cv_results_ attribute. The grid_scores_ attribute will not be available from 0.20
  DeprecationWarning)
Out[9]:
[mean: 0.96000, std: 0.05333, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 2},
 mean: 0.97333, std: 0.03266, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 10},
 mean: 0.96000, std: 0.05333, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 1},
 mean: 0.96667, std: 0.04472, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 4},
 mean: 0.96667, std: 0.04472, params: {'weights': 'distance', 'p': 2, 'n_neighbors': 8},
 mean: 0.96667, std: 0.04472, params: {'weights': 'uniform', 'p': 2, 'n_neighbors': 7},
 mean: 0.95333, std: 0.06700, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 4},
 mean: 0.95333, std: 0.05207, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 7},
 mean: 0.94000, std: 0.06289, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 2},
 mean: 0.97333, std: 0.03266, params: {'weights': 'uniform', 'p': 1, 'n_neighbors': 9}]
In [11]:
# The new cv_results attribute outpute our results in JSON
# Throw it in a dataframe to make some sense of it
json_df = pd.DataFrame(random_search.cv_results_).head(3)
json_df 
Out[11]:
mean_fit_time std_fit_time mean_score_time std_score_time param_weights param_p param_n_neighbors params split0_test_score split1_test_score ... split2_train_score split3_train_score split4_train_score split5_train_score split6_train_score split7_train_score split8_train_score split9_train_score mean_train_score std_train_score
0 0.000292 0.000037 0.000545 0.000114 distance 2 2 {'weights': 'distance', 'p': 2, 'n_neighbors': 2} 1.0 0.933333 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
1 0.000293 0.000102 0.000528 0.000087 distance 2 10 {'weights': 'distance', 'p': 2, 'n_neighbors':... 1.0 0.933333 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0
2 0.000265 0.000024 0.000493 0.000094 distance 2 1 {'weights': 'distance', 'p': 2, 'n_neighbors': 1} 1.0 0.933333 ... 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 0.0

3 rows × 33 columns

In [12]:
# Here is the raw JSON output
random_search.cv_results_
Out[12]:
{'mean_fit_time': array([ 0.00029211,  0.00029318,  0.00026453,  0.00026796,  0.00027027,
         0.00027721,  0.00029645,  0.00026639,  0.00025463,  0.00026634]),
 'std_fit_time': array([  3.71983109e-05,   1.02307524e-04,   2.38624127e-05,
          2.79288718e-05,   3.62967054e-05,   4.16681812e-05,
          5.36330741e-05,   2.61113629e-05,   1.27668641e-05,
          4.79880930e-05]),
 'mean_score_time': array([ 0.00054541,  0.00052838,  0.00049326,  0.00048833,  0.00048378,
         0.00047646,  0.00046542,  0.00048676,  0.00046687,  0.00046294]),
 'std_score_time': array([  1.14184241e-04,   8.74830381e-05,   9.36912486e-05,
          6.95798226e-05,   4.37161709e-05,   6.20151611e-05,
          5.55320274e-05,   1.04900880e-04,   9.93801053e-05,
          4.81664600e-05]),
 'param_weights': masked_array(data = ['distance' 'distance' 'distance' 'distance' 'distance' 'uniform' 'uniform'
  'uniform' 'uniform' 'uniform'],
              mask = [False False False False False False False False False False],
        fill_value = ?),
 'param_p': masked_array(data = [2 2 2 2 2 2 1 1 1 1],
              mask = [False False False False False False False False False False],
        fill_value = ?),
 'param_n_neighbors': masked_array(data = [2 10 1 4 8 7 4 7 2 9],
              mask = [False False False False False False False False False False],
        fill_value = ?),
 'params': [{'weights': 'distance', 'p': 2, 'n_neighbors': 2},
  {'weights': 'distance', 'p': 2, 'n_neighbors': 10},
  {'weights': 'distance', 'p': 2, 'n_neighbors': 1},
  {'weights': 'distance', 'p': 2, 'n_neighbors': 4},
  {'weights': 'distance', 'p': 2, 'n_neighbors': 8},
  {'weights': 'uniform', 'p': 2, 'n_neighbors': 7},
  {'weights': 'uniform', 'p': 1, 'n_neighbors': 4},
  {'weights': 'uniform', 'p': 1, 'n_neighbors': 7},
  {'weights': 'uniform', 'p': 1, 'n_neighbors': 2},
  {'weights': 'uniform', 'p': 1, 'n_neighbors': 9}],
 'split0_test_score': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]),
 'split1_test_score': array([ 0.93333333,  0.93333333,  0.93333333,  0.93333333,  0.93333333,
         0.93333333,  0.93333333,  0.93333333,  0.93333333,  0.93333333]),
 'split2_test_score': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]),
 'split3_test_score': array([ 0.93333333,  1.        ,  0.93333333,  0.93333333,  1.        ,
         1.        ,  0.93333333,  1.        ,  0.93333333,  1.        ]),
 'split4_test_score': array([ 0.86666667,  0.93333333,  0.86666667,  0.86666667,  0.86666667,
         0.86666667,  0.86666667,  0.86666667,  0.86666667,  1.        ]),
 'split5_test_score': array([ 1.        ,  0.93333333,  1.        ,  1.        ,  0.93333333,
         0.93333333,  1.        ,  0.86666667,  1.        ,  0.93333333]),
 'split6_test_score': array([ 0.86666667,  0.93333333,  0.86666667,  0.93333333,  0.93333333,
         0.93333333,  0.8       ,  0.93333333,  0.8       ,  0.93333333]),
 'split7_test_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         1.        ,  1.        ,  0.93333333,  0.93333333,  0.93333333]),
 'split8_test_score': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]),
 'split9_test_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         1.        ,  1.        ,  1.        ,  0.93333333,  1.        ]),
 'mean_test_score': array([ 0.96      ,  0.97333333,  0.96      ,  0.96666667,  0.96666667,
         0.96666667,  0.95333333,  0.95333333,  0.94      ,  0.97333333]),
 'std_test_score': array([ 0.05333333,  0.03265986,  0.05333333,  0.04472136,  0.04472136,
         0.04472136,  0.06699917,  0.05206833,  0.06289321,  0.03265986]),
 'rank_test_score': array([ 6,  1,  6,  3,  3,  3,  8,  8, 10,  1], dtype=int32),
 'split0_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.96296296,  0.94814815,  0.96296296,  0.97037037,  0.96296296]),
 'split1_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97777778,  0.96296296,  0.97777778,  0.97777778,  0.97037037]),
 'split2_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97037037,  0.95555556,  0.96296296,  0.97037037,  0.97037037]),
 'split3_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97037037,  0.96296296,  0.97037037,  0.97777778,  0.97777778]),
 'split4_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.98518519,  0.97037037,  0.98518519,  0.97037037,  0.97037037]),
 'split5_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97037037,  0.95555556,  0.96296296,  0.96296296,  0.96296296]),
 'split6_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97777778,  0.97037037,  0.97037037,  0.98518519,  0.97777778]),
 'split7_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97037037,  0.95555556,  0.97037037,  0.97037037,  0.96296296]),
 'split8_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97037037,  0.95555556,  0.97037037,  0.97037037,  0.96296296]),
 'split9_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97777778,  0.95555556,  0.95555556,  0.97037037,  0.94814815]),
 'mean_train_score': array([ 1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
         0.97333333,  0.95925926,  0.96888889,  0.97259259,  0.96666667]),
 'std_train_score': array([ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.00592593,  0.00682929,  0.00797802,  0.00578537,  0.00828173])}

Author: Kavi Sekhon