# Gated Recurrent Units

The problem with regular Recurrent Neural Networks is that, due to the vanishing gradient problem, they struggle to remember specific information over a period of time. For instance the following sentences

```
The cat ate ... and was full
The cats ate ... and were full
```

might be completely identical, save for the plurality of the subject, and by extension the tense of “was” vs “were”

## Memory Cells

Gated Recurrent Units have this concept of a *memory cell*, `c`

that learns and carries information from once layer to the next. The reason we call it **Gated** is because at each step along the way, the cell could potentially accept or ignore carry-over information.

We calculate a layer’s *candidate values* using information from the last memory cell as well as this layer’s input. We want the values to be between `-1`

and `1`

, so we use the `tanh()`

function.

$\tilde{c}^{\langle t \rangle} = tanh(Wcc^{\langle t-1 \rangle} + Wcx^{\langle t \rangle} + b_c)$

Next, we define the *update gate*, which creates a sort of value-level mask.

$\Gamma_u = \sigma(Wuc^{\langle t-1 \rangle} + Wux^{\langle t \rangle} + b_u)$

Finally, we combine the two, element-wise. Intuitively, this can be interpreted as:

- “Calculate the
*values*that we want to carry over, assuming we’re carrying everything over” - “Decide
*what to carry over*from one step to the next”

$c^{\langle t \rangle} = \Gamma_u * \tilde{c}^{\langle t \rangle} + (1 - \Gamma_u) * c^{\langle t-1 \rangle}$

This should make sense, as the Gamma and 1-minus-Gamma values sum to one. We can interpret this as “keep the first half, forget the second half, which was already in the cell.”

## Visually

```
from IPython.display import Image
Image('images/gru.png')
```

We can also have an output `y_t`

at each layer, but this diagram is intended to highlight the memory-construction that happens within each cell.