Scatter plot tips

Scatter plots are the bread and butter of anyone doing data exploration. It’s particularly useful to style each point plotted based on values. So let’s look at a simple example.

%pylab inline

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression

X, y = make_classification(n_features=2, n_redundant=0)
x0, x1 = X[:, 0], X[:, 1]

plt.scatter(x0, x1)
Populating the interactive namespace from numpy and matplotlib





<matplotlib.collections.PathCollection at 0x1ee78026278>

png

Each point has a corresponding True/False value.

Scatter by Color

We can use the c= argument to change the color of the point based on the value in the array. It plots cleanly for distinct values

plt.scatter(x0, x1, c=y, cmap='Spectral')
<matplotlib.collections.PathCollection at 0x1ee75deec50>

png

And on a spectrum for continuous values

plt.scatter(x0, x1, c=x0+x1)
<matplotlib.collections.PathCollection at 0x1ee79475898>

png

Conditionally Styling

Other arguments aren’t so friendly. For instance, marker= only takes one value.

We can make good use of a dict of values and the builtin enumerate function to (tediously) plot each point separately, according to whatever rules you want.

markers = {0: '$x$', 1: '$o$'}
colors = {0: 'red', 1: 'blue'}

for idx, _ in enumerate(x0):
    plt.scatter(x0[idx], x1[idx],
                marker=markers[y[idx]],
                c=colors[y[idx]])

png

I think this dict-unpacking solution looks much neater.

kwargs = {0: {'color': 'red', 'marker': '$x$', 'alpha': .5},
          1: {'color': 'blue', 'marker': '$o$'}}

for idx, _ in enumerate(x0):
    plt.scatter(x0[idx], x1[idx], **kwargs[y[idx]])

png