Exponentially Weighted Moving Averages
Say we have a dataset like the following.
%pylab inline
from helpers import make_dataset, make_fig
X, y = make_dataset()
make_fig(X, y);
Populating the interactive namespace from numpy and matplotlib
If we drew a line following the shape of the data, there would be a clear dip in the middle.
We could achieve by rolling through the data, taking the average of the 3 points we’re looking at.
import pandas as pd
rolling = pd.Series(y).rolling(3).mean()
However, this completely throws away the first couple rows, as they don’t have any historical data to average with
rolling[:5]
0 NaN
1 NaN
2 0.583812
3 0.596528
4 0.635658
dtype: float64
But by the middle of the dataset, we’ve got a pretty good representation going
rolling[50:60]
50 0.427411
51 0.480765
52 0.472817
53 0.465212
54 0.454143
55 0.479187
56 0.512444
57 0.521262
58 0.492924
59 0.481478
dtype: float64
Which allows us to make plots like this
rolling = pd.Series(y).rolling(3).mean()
ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');
And we can smooth this out by taking a larger value for the number of rolling rows to consider at once.
rolling = pd.Series(y).rolling(10).mean()
ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');
However, this lops off more and more data from the beginning as we smooth.
A Different Rolling Algorithm
Andrew Ng introduces an alternative approach in Week 2 of Improving Deep Neural Networks called Exponentially Weighted Averages. Consider a simple example where xt
is the raw value at time t and v
is the value of the algorithm.
$v_0 = 0$
$v_1 = 0.9v_0 + 0.1x_1$
$v_2 = 0.9v_1 + 0.1x_2$
$v_3 = 0.9v_2 + 0.1x_3$
$…$
Expanding out for v3
, we get
$v_3 = 0.9(0.9v_1 + 0.1x_2) + 0.1x_3$
$v_3 = 0.9(0.9(0.9v_0 + 0.1x_1) + 0.1x_2) + 0.1x_3$
$v_3 = 0.9(0.9(0.9(0) + 0.1x_1) + 0.1x_2) + 0.1x_3$
Reducing, it quickly becomes obvious where the “Exponential” part comes in
$v_3 = 0.9^2 *0.1x_1 + 0.9*0.1x_2 + 0.1x_3$
Another coefficient
In this example we used values 0.9
and 0.1
. More generally, we pick values beta
and 1 - beta
that add up to one.
$v_t = \beta v_T + (1 - \beta)x_t$
$T = t-1$
(T substitution because LaTeX sucks with markdown, lol)
And since beta
is less than one, as we move further and further into our v
values, increasing the exponent attached to beta
, it goes closer and closer to zero, thus giving less weight to historic values.
Properties
Smoothness
Because this weighting is multiplicative across all observations, it makes a much smoother curve. Compare our previous implementation of a naive rolling average
rolling = pd.Series(y).rolling(3).mean()
ax = make_fig(X, y)
ax.plot(X, rolling, linewidth=3, color='r');
To the less-noisy EWM approach
new_rolling = pd.Series(y).ewm(3).mean()
ax = make_fig(X, y)
ax.plot(X, new_rolling, linewidth=3, color='r');
Additionally, note starting with v0=0
, we’ve got data for each point
Sizing beta
Generally speaking, the higher your value for beta
, the larger the exponent needs to be for the coefficient to approach zero
betas = [0.99, 0.9, 0.75, 0.3, 0.01]
xrange = np.linspace(0, 10)
for beta in betas:
plt.plot(xrange, beta ** xrange, label=str(beta))
plt.legend(loc='right');
Therefore, the higher your beta
, the more days you consider at once in your rolling window.
beta=0.9
is pretty standard.
Bias Correction
It’s worth mentioning that the pandas.ewm()
method did us a solid behind the scenes when we called it.
If you look back at our first dummy example, our value for v0
was simply 0
. Thus when you plot out all of the v
’s, you get the purple line– starts off at 0
, but eventually represents a smooth average of values.
from IPython.display import Image
Image('images/bias_correction.png')
To get us plotting the green line, pandas
actually used Bias Correction to correct for the fuzziness in early values. Essentially what this means is doing another step after the calculation of vt
that involves dividing by 1 minus the bias term to the power of whatever step you’re on
Concretely:
$\frac{v_t}{1 - \beta^t}$
Plotting that out over time, you can see that the denominator quickly goes to 1, thus becoming a non-factor in our calculation, and syncing up with the green line above.
x = np.arange(1, 20, 1)
y = (1 / (1 - .9 ** x))
plt.plot(x, y);