Scatter plot tips

20 Aug 2018

Scatter plots are the bread and butter of anyone doing data exploration. It’s particularly useful to style each point plotted based on values. So let’s look at a simple example.

%pylab inline

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_features=2, n_redundant=0)
x0, x1 = X[:, 0], X[:, 1]

plt.scatter(x0, x1)

Populating the interactive namespace from numpy and matplotlib





<matplotlib.collections.PathCollection at 0x1ee78026278>

png

Each point has a corresponding True/False value.

Scatter by Color

We can use the c= argument to change the color of the point based on the value in the array. It plots cleanly for distinct values

plt.scatter(x0, x1, c=y, cmap='Spectral')

<matplotlib.collections.PathCollection at 0x1ee75deec50>

png

And on a spectrum for continuous values

plt.scatter(x0, x1, c=x0+x1)

<matplotlib.collections.PathCollection at 0x1ee79475898>

png

Conditionally Styling

Other arguments aren’t so friendly. For instance, marker= only takes one value.

We can make good use of a dict of values and the builtin enumerate function to (tediously) plot each point separately, according to whatever rules you want.

markers = {0: '$x$', 1: '$o$'}
colors = {0: 'red', 1: 'blue'}

for idx, _ in enumerate(x0):
    plt.scatter(x0[idx], x1[idx],
                marker=markers[y[idx]],
                c=colors[y[idx]])

png

I think this dict-unpacking solution looks much neater.

kwargs = {0: {'color': 'red', 'marker': '$x$', 'alpha': .5},
          1: {'color': 'blue', 'marker': '$o$'}}

for idx, _ in enumerate(x0):
    plt.scatter(x0[idx], x1[idx], **kwargs[y[idx]])

png