# Random Search and Appropriate Search-Space Scaling

Grid search isn’t always our best approach for figuring out our best hyperparameters.

In the example of Deep Learning and Adam Optimization, there are several different hyperparameters to consider. Some, like the `alpha`

constant, need tuning. On the other hand, constants like `epsilon`

are basically taken as granted and don’t affect the model.

```
from IPython.display import Image
Image('images/feature_grid.PNG')
```

By grid searching over any feature space that includes `epsilon`

, we’re putting **five times the computation** on our system for less-than-negligible performance gain.

Instead, Andrew Ng suggests doing a random search over the feature space to arrive at your coefficients. If instead, we did half the number of searches in a random fashion (below), we’d still get the learnings from tuning `alpha`

, without over-searching in the `epsilon`

space.

`Image('images/feature_grid_random.png')`

But this presents a new, interesting problem.

Whereas something like *Number of Layers* or *Number of Hidden* units might make sense to sample from on a linear scale, not all coefficients behave this way.

For instance, adjusting our `alpha`

constant between `0.05`

and `0.06`

likely has a bigger performance impact than adjusting between `0.5`

and `0.6`

, as it’s often a low-valued number.

And so by sampling randomly on a linear scale between `0.0001`

and and `1`

, we spend as much compute resources investigating the upper-range values as the incremental, lower ones where the real optimization occurs.

`Image('images/uniform_alpha.PNG')`

## Scaling

Thus, determining the correct scaling mechanism for your hyperparameters is crucial. You want to scale, based on the class of the coefficient, which may include the following:

### Linear

This one’s easy. We just want to do a uniform search between two values.

```
import numpy as np
min_val, max_val = 0, 1
np.random.uniform(min_val, max_val)
```

```
0.9358677626967968
```

### Discrete

For whole-numbered values between two values, we’ll use `randint()`

`np.random.randint(0, 10)`

```
2
```

### Log Scale

For coefficients, like `alpha`

above, where we want to select between a very small value and `1`

, it’s helpful to consider how to write it out as an exponent. For instance:

$0.0001 = 10^{-4}$

similarly

$1 = 10^{0}$

So this is actually just the same exercise as the Linear Scale, but between some negative number and `0`

, then piped as an exponent!

```
min_exp, max_exp = -4, 0
val = np.random.uniform(min_exp, max_exp)
10 ** val
```

```
0.00305918793992655
```

### Exponentially Weighted

This would likely be better-named as “Reverse Log Scale,” describing hyperparameters where your search space is most effective between, say, `0.9`

and `0.999`

, on a Log-ish scale.

Following the same approach as above, we just want to do a uniform search over the correct range of values, plus some other steps– in this case, establishing a log log scale for `0.9`

to `0.999`

involves establishing a log scale for `0.0001`

to `0.1`

and subtracting that from `1`

```
min_exp, max_exp = -3, -1
val = np.random.uniform(min_exp, max_exp)
1 - (10 ** val)
```

```
0.9908195535776579
```