# Recurrent Neural Network Basics

Recurrent Neural Networks are designed to learn information from sequential data.

We start with datasets of x time steps in a row, for example:

• x words in a sentence
• x sequential stock ticks
• x days of weather in a row

Thus, we say that there are T_x elements in a given point of data.

In the most basic case, we have some handoff of information, a_i from layer i-1 to i. This gets run through a cell at each step and outputs:

• y_i the prediction at the ith step
• a_i the carry-forward information to the same cell

## Visually

Each cell calculates given information of both, this element as well as the output of the last element

from IPython.display import Image
Image('images/base_rnn.png') Specifically at the cell level, this happens through pairs of Weight matricies and Bias terms.

• W_aa, b_a: to this activation from the last activation
• W_ax, b_x: to this activation from the input data
• W_ya, b_y: (not pictured): to this output from this activation
Image('images/rnn_cell.PNG') The values for W_aa, W_ay, and W_ax are literally the same object, regardless which element, t, you’re looking at.

## Cost Function

The output at each layer has its own cost function that looks like our typical logistic/softmax cost

$\mathcal{L}^{\langle t \rangle}(\hat{y}^{\langle t \rangle}, y^{\langle t \rangle}) = -y^{\langle t \rangle} log \hat{y}^{\langle t \rangle} -(1-y^{\langle t \rangle}) log (1-\hat{y}^{\langle t \rangle})$

The key difference here, however, is that there’s an over-arching cost L that applies across each layer

$\mathcal{L}(\hat{y}, y) = \sum_{t=1}^{T_x} \mathcal{L}^{\langle t \rangle}(\hat{y}^{\langle t \rangle}, y^{\langle t \rangle})$

Image('images/rnn_backprop.png') This allows us a great deal of flexibility in how we construct our Recurrent Networks

## Different Architectures

Image('images/rnn_types.png') • one-to-one: Vanilla MLP
• one-to-many: text generation
• many-to-one: gender prediction based on audio clip
• many-to-many(1): language translation
• many-to-many(2): video classification