Exponentially Weighted Moving Averages

Say we have a dataset like the following.

%pylab inline
from helpers import make_dataset, make_fig

X, y = make_dataset()
make_fig(X, y);
Populating the interactive namespace from numpy and matplotlib

png

If we drew a line following the shape of the data, there would be a clear dip in the middle.

We could achieve by rolling through the data, taking the average of the 3 points we’re looking at.

import pandas as pd

rolling = pd.Series(y).rolling(3).mean()

However, this completely throws away the first couple rows, as they don’t have any historical data to average with

rolling[:5]
0         NaN
1         NaN
2    0.583812
3    0.596528
4    0.635658
dtype: float64

But by the middle of the dataset, we’ve got a pretty good representation going

rolling[50:60]
50    0.427411
51    0.480765
52    0.472817
53    0.465212
54    0.454143
55    0.479187
56    0.512444
57    0.521262
58    0.492924
59    0.481478
dtype: float64

Which allows us to make plots like this

rolling = pd.Series(y).rolling(3).mean()
ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');

png

And we can smooth this out by taking a larger value for the number of rolling rows to consider at once.

rolling = pd.Series(y).rolling(10).mean()
ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');

png

However, this lops off more and more data from the beginning as we smooth.

A Different Rolling Algorithm

Andrew Ng introduces an alternative approach in Week 2 of Improving Deep Neural Networks called Exponentially Weighted Averages. Consider a simple example where xt is the raw value at time t and v is the value of the algorithm.

$v_0 = 0$

$v_1 = 0.9v_0 + 0.1x_1$

$v_2 = 0.9v_1 + 0.1x_2$

$v_3 = 0.9v_2 + 0.1x_3$

$…$

Expanding out for v3, we get

$v_3 = 0.9(0.9v_1 + 0.1x_2) + 0.1x_3$

$v_3 = 0.9(0.9(0.9v_0 + 0.1x_1) + 0.1x_2) + 0.1x_3$

$v_3 = 0.9(0.9(0.9(0) + 0.1x_1) + 0.1x_2) + 0.1x_3$

Reducing, it quickly becomes obvious where the “Exponential” part comes in

$v_3 = 0.9^2 *0.1x_1 + 0.9*0.1x_2 + 0.1x_3$

Another coefficient

In this example we used values 0.9 and 0.1. More generally, we pick values beta and 1 - beta that add up to one.

$v_t = \beta v_T + (1 - \beta)x_t$

$T = t-1$

(T substitution because LaTeX sucks with markdown, lol)

And since beta is less than one, as we move further and further into our v values, increasing the exponent attached to beta, it goes closer and closer to zero, thus giving less weight to historic values.

Properties

Smoothness

Because this weighting is multiplicative across all observations, it makes a much smoother curve. Compare our previous implementation of a naive rolling average

rolling = pd.Series(y).rolling(3).mean()

ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');

png

To the less-noisy EWM approach

new_rolling = pd.Series(y).ewm(3).mean()

ax = make_fig(X, y)
ax.plot(X, new_rolling, linewidth=3, color='r');

png

Additionally, note starting with v0=0, we’ve got data for each point

Sizing beta

Generally speaking, the higher your value for beta, the larger the exponent needs to be for the coefficient to approach zero

betas = [0.99, 0.9, 0.75, 0.3, 0.01]

xrange = np.linspace(0, 10)

for beta in betas:
    plt.plot(xrange, beta ** xrange, label=str(beta))
    
plt.legend(loc='right');

png

Therefore, the higher your beta, the more days you consider at once in your rolling window.

beta=0.9 is pretty standard.

Bias Correction

It’s worth mentioning that the pandas.ewm() method did us a solid behind the scenes when we called it.

If you look back at our first dummy example, our value for v0 was simply 0. Thus when you plot out all of the v’s, you get the purple line– starts off at 0, but eventually represents a smooth average of values.

from IPython.display import Image

Image('images/bias_correction.png')

png

To get us plotting the green line, pandas actually used Bias Correction to correct for the fuzziness in early values. Essentially what this means is doing another step after the calculation of vt that involves dividing by 1 minus the bias term to the power of whatever step you’re on

Concretely:

$\frac{v_t}{1 - \beta^t}$

Plotting that out over time, you can see that the denominator quickly goes to 1, thus becoming a non-factor in our calculation, and syncing up with the green line above.

x = np.arange(1, 20, 1)
y = (1 / (1 - .9 ** x))

plt.plot(x, y);

png