Singular Value Decomposition in Python

Example: Compression

Near the bottom of the writeup in the AMA, they demonstrate how useful SVD is a way of compressing/uncompressing data.

This will make a good working example as any.

They assemble a 25 X 15 matrix of zeros and ones that form an image of a 0. Per the example, there are really only different types of columns that go into constructing the bigger picture– and thus some obvious information redundancy that we can leverage.

%pylab inline
Populating the interactive namespace from numpy and matplotlib
a = np.ones((25, 1))
b = np.vstack([np.ones((5, 1)), 
               np.zeros((15, 1)),
               np.ones((5, 1))])
c = np.vstack([np.ones((5, 1)),
               np.zeros((3, 1)),
               np.ones((9, 1)),
               np.zeros((3, 1)),
               np.ones((5, 1))])
zero = np.c_[a, a, b, b, b, c, c, c, c, c, b, b, b, a, a]

zero.shape
(25, 15)

Altogether, our data looks like

plt.imshow(zero);

png

In Python

To perform SVD in Python, we can turn to the numpy implementation

import numpy.linalg as LA

The svd() function returns all three components of the righthand side of our SVD form A = U SIGMA V.T.

Note: The vh term is intentional as it is the conjugate transpose of our matrix V

u, s, vh = LA.svd(zero)

Inspecting, the shapes of u and vh are square and correspond to the column and row spaces, respectively.

u.shape, s.shape, vh.shape
((25, 25), (15,), (15, 15))

Looking at s, however, it’s obvious that we’re carrying more information than we need.

It only needed 3 singular values to reassemble the information stored in our eigen basis matricies– the rest was simply zero and will be ignored.

np.set_printoptions(precision=3, suppress=True)

s
array([14.724,  5.217,  3.314,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,
        0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ,  0.   ])

And so we can omit the rows/columns of our matricies accordingly.

U = u[:, :3]

S = s[:3]

V = vh[:3, :]
U.shape, S.shape, V.shape
((25, 3), (3,), (3, 15))

Finally, reassembling our original matrix using simply vectors and scalars, we have.

plt.imshow(np.dot(U * S, V));

png

It’s worth calling out that our original representation had a number of values equal to:

zero.shape[0] * zero.shape[1]
375

But after paring down to just the columns, singular values, and rows that matter, we have

(U.shape[0] * U.shape[1]) + S.shape[0] + (V.shape[0] * V.shape[1])
123

Less than a third.