Making Images Palette-able in Python
Awhile ago, I came across a really neat tool that allows a user to pass in an image and generate a representative color palette, derived from the colors of the image, essentially going from this...
display.Image('Images/old_masters.jpg')
... to this
display.Image('Images/old_masters_colors.png')
And so after cramming a slew of .PNG
files I had laying around, I got curious how it actually worked. Which, in turn, led to my first bit of rabbit-holing on working with image data in Python.
I never got around to learning how to recreate the site's algorithm one-to-one, but I did pick up a bunch of practical skills and raise some interesting questions I thought merited sharing :)
Thinking of Images as Vectors¶
To get things kicked off, I'm going to import Pillow (PIL), the batteries-included, bread-and-butter Image Processing library in Python
from PIL import Image, ImageDraw
and use it to load up an image from a cache of movie posters I downloaded for another side-project (that never went anywhere, haha)
img = Image.open('posters/Blade Runner 2.png')
img
Here, img
represents an object of type JpegImageFile
, which allows us to do handy things like crop or do some light editing
type(img)
But more importantly for the purposes of this post, it also makes it neatly consumable by Python's darling computation workhorse, numpy
. Now, going to stuff the image into an array
, taking us from pixels to a bunch of numbers that represent the pixels.
import numpy as np
vec = np.array(img)
If we take a peek at vec
, we get a big, incomprehensible printout of a bunch of numbers
vec
But a closer look at the shape of this object helps us interpret what we're looking at
vec.shape
It's no accident that looking that the size of our original image, it has a width of 150 pixels and a height of 210.
img.size
But what of the 3
at the end of (210, 150, 3)
?
Well, these represent the distinct Red, G reen, and Blue values that define the color of each pixel. If this concept is foreign to you, poke around this site for a minute or two, as it's pretty much the crux of the rest of the post.
All told, each of the height x width
different pixels having their own RGB
values mean that our simple, compact image is actually represented by a lot of numbers
vec.size
So what should we do with these numbers?
TL;DR: K-Means Clustering¶
Clustering is one of the core areas of unsupervised learning, and essentially answers "I have a bunch of data, can you segment it into groups for me?"
Perhaps the easiest of these algorithms to understand is K-Means, which can be summarized as
- Pick N random spots on the grid
- For each point of data you've got, figure out which target is closest
Is there an about-even split of points-to-closet-targets?
- Yes? Done deal
- No? Move the targets a bit and check again
For a less hand-wavy explanation (featuring great graphics), I like this video
# Youtube
HTML('<iframe width="560" height="315" '
'src="https://www.youtube.com/embed/IuRb3y8qKX4" '
'frameborder="0" gesture="media" allow="encrypted-media" '
'allowfullscreen></iframe>')
But not to dwell too long on the topic, let's cobble together a quick example to demonstrate this visually.
I'm going to lean on the most vanilla dataset in all of Data Science, which is basically petal and sepal measurements of a bunch of flowers.
import seaborn as sns
iris = sns.load_dataset('iris')
iris.head()
For the sake of visualization, we're going to throw away all but two columns to make a cheap scatter plot
trimmedData = iris.loc[: , ['sepal_length', 'sepal_width']]
plt.scatter(x=trimmedData['sepal_length'], y=trimmedData['sepal_width']);
Then, we're going to leverage the K-Means implementation in sklearn
to try and separate these points into 3 different groups
from sklearn.cluster import KMeans
model = KMeans(n_clusters=3)
model.fit(trimmedData)
trimmedData['label'] = model.labels_
This runs almost instantly for such a small dataset, and now we can plot the same points, but this time assigning a color based on which group they wound up in.
plt.scatter(x=trimmedData['sepal_length'],
y=trimmedData['sepal_width'],
c=trimmedData['label']);
To hammer the point home, running K-Means over this data to get three Targets yields three groups:
- The Purple points are centered around, on average,
(5.0, 4.0)
- The Teal points are centered around, on average,
(5.6, 2.7)
- The Yellow points are centered around, on average,
(7.3, 3.3)
Manipulating¶
Our example above clustered points arranged in two-dimensional, X/Y space. However, this extends painlessly into 3-dimensions, where our (R, G, B)
color definitions live. First, though, we need to take our original 210 x 150
image and basically unravel it into one long chain of RGB values.
numpy
makes this a breeze with reshape
reshaped = vec.reshape(-1, 3)
reshaped.shape
The -1
in the function call might seem confusing at a first glance, but basically we knew we wanted to package everything into chunks of 3
, per RGB. The -1
is an indicator that numpy
should just figure out how to make that happen. Thus, it takes all 94,500 points of data (as above), and realizes that you can cleanly group them into 3's if you make one long list of 31.5k elements.
At this point, we're wandering into the neighborhood of "our data is getting hard to interpret"
plt.imshow(reshaped);
but trust me, this is one long line of every pixel of our original image.
All Together¶
For convenience, I've packaged the rest of my spaghetti code into functions any interested reader can check out here.
from imagetools import (path_to_img_array,
pick_colors,
show_key_colors)
But basically these:
- Load an image up to a vector, from a given path
path = 'posters/Blade Runner 2.png'
vec = path_to_img_array(path)
vec.shape
- Unrolls our image and runs K-Means clustering to find "Target Points" all of the pixel values are grouped around (here, we choose 3)
colors = pick_colors(vec, 3)
colors
- Finally, one last function to take these Targets and plot out some simple boxes to show the colors it found
show_key_colors(colors)
Looks about right, yeah?
Image.open('Posters/Blade Runner 2.png')
Considering the Number of Means¶
As I played around with this, I quickly discovered that generating a meaningful color palette from an image was very sensitive to how many Targets you ask sklearn
to sniff out.
Lets take a look at a more interesting poster (from a movie I never heard of...)
path = 'posters/3 Idiotas.png'
img = Image.open(path)
img
Running K-Means for a mere two colors gives the following
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 2)));
Interestingly, we captured the blue of the image (the majority color). On the other hand, we've determined that the "average color" of lime green, fuchsia, honey yellow, hot pink, orange, and lavender is... a gross beige, lol
But take a look at what happens when we allow for more and more means:
At 3, we separate light and dark tones and the blue pops
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 3)));
At 4, we extract brown tones from light/dark
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 4)));
At 5, we split brown to get purple and yellow
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 5)));
At 6, we split purple into a salmon and an olive green
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 6)));
At 7, we split our blue into two
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 7)));
Finally, 8 gives us about as diverse a palette as we'd like from this image
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 8)));
So obviously allowing for more Targets could help us pick out more unique colors. Neat.
Considering Image Size¶
Until now, we've been looking at meager 150 x 210
images. What happens when we examine larger images?
To play with this idea, I took a still from The Grand Budapest Hotel, a delightfully-colorful movie that I love.
At a glance, it seems like a no-brainer what 5 colors we'd come up with
wes = Image.open('Images/wes.png')
wes
Or so I thought
path = 'images/wes.png'
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 5)));
I'll spare you the suspense-- trying different values from 5 and up, we never really achieve the clean, "green, blue, brown, red, orange" sampling we'd expect.
path = 'images/wes.png'
img = path_to_img_array(path)
plt.imshow(show_key_colors(pick_colors(img, 10)));
This is likely due to the fact that while their clothing makes up about 50% of the image, the other 50% is comprised of skin tones, the white wall, the painting in the back ground, etc. This muddles up the points K-Means considers when trying to find these neat Target points. Indeed, if we were really hell-bent on getting those colors to pop, we might do some naive Photoshopping to erase all of the background noise, then rerun.
But that's not the focus of this section of the post. Instead, I want to focus on the actual computation that we're doing.
If you take a look at the underlying shape, we've got a 600 x 900
image
path = 'images/wes.png'
img = path_to_img_array(path)
img.shape
Which tallies to-- including RGB values-- a ton of numbers
img.size
By contrast, recall the total size of our movie posters coming in at
vec.size
Or as a percentage
vec.size / img.size
What you didn't see, reading this finished post, was how long each stab at this was taking to run.
We can use the %time
command in our Jupyter Notebook to see how long it takes us to generate our target colors (look for Wall time:
)
path = 'images/wes.png'
img = path_to_img_array(path)
%time plt.imshow(show_key_colors(pick_colors(img, 8)));
23 seconds. Compare that to the runtime with a much smaller image.
path = 'posters/Blade Runner 2.png'
img = path_to_img_array(path)
%time plt.imshow(show_key_colors(pick_colors(img, 8)));
We measure calculating over an image 5% of the size, in miliseconds. But our 600 x 900
image takes almost half a minute.
More to the point, anyone familiar with image files probably knows that 600 x 900
really isn't that big at all. You can imagine what would happen if we tried running it for this image
path = 'images/bigWes.png'
img = Image.open(path)
img
Which is actually of size
img.size
And considering this many numbers
np.array(img).size
Woof.
path = 'images/bigWes.png'
img = path_to_img_array(path)
%time plt.imshow(show_key_colors(pick_colors(img, 5)));
I've got a pretty powerful workbench, as far as home PC's go. I literally got up and did laundry and got some coffee while this ran. If we're going to keep playing around with this tool, we need to figure out how to tighten up the feedback loop by decreasing the run time.
Can We Do the Same with Less?¶
In other words, what are the potential drawbacks of just using smaller images?
I took the following snapshot from the movie Zootopia. Not only does it hold the distinction of "darkest deleted Pixar scene, this side of Up", but every frame of the movie is full of extremely vibrant color.
zoo = Image.open('Images/zoo.png')
zoo
This image in particular is 740 x 308
and yields the following colors, when fishing for 5 Targets
zoo.size
img = np.array(zoo)
%time plt.imshow(show_key_colors(pick_colors(img, 5)));
But what happens when we use the same image, but only 100 pixels wide?
zoo_small = Image.open('Images/zoo_small.png')
zoo_small
Basically the same result, but in almost 2% of the time!
img = np.array(zoo_small)
%time plt.imshow(show_key_colors(pick_colors(img, 5)));
This might be surprising at first, but don't let the size of the image throw you for a loop-- there's plenty of information in the 100 x 42
representation.
Indeed, if we use some photo-editing software (I like Paint.net) to blow the image back up to its original resolution, it uses the pixel values in the smaller images to extrapolate to the bigger, albeit blurry, image.
grainy = Image.open('Images/grainyZoo.png')
grainy
Lo and behold, running K-Means on grainy
yields the same
img = np.array(grainy)
plt.imshow(show_key_colors(pick_colors(img, 5)));
And so the similarity of the two results shouldn't surprise you. After all, K-Means arrives at the Target values by just taking a bunch of averages-- which is precisely what's happening when I scale down and then back up with the image:
- We need to compress an image into less space, so we remove data that's close enough to its neighbor that we still have it
- When we scale back out, we do sort of broadcast that "representitive color" out to its surrounding area in order to fill the space
Because it's just averages, we arrive at more or less the same palette
One (whole mess of) Step(s) Further¶
So as we saw, we achieved comparable results, in dramatically faster runtimes, on relatively smaller images. Now the question becomes
How much can we scale things down before we're guaranteed to lose our information?
For the sake of example, imagine that I had instead shrank the picture to 5 pixels wide, not 100. What then?
Image.open('Images/zoo_tiny.png')
^ That's an image, lol
I'll spare you doing one of these numbers and like before, I'll resize back to the original size, using the smaller representation.
display.Image('Images/RIPzoo.png')
And perhaps-unsurprisingly, if the image we're feeding in is 5 x 2
, it basically just picks 5 of the points, from left to right
tiny = Image.open('Images/zoo_tiny.png')
tiny.size
img = np.array(tiny)
plt.imshow(show_key_colors(pick_colors(img, 5)));
But with noticeably-less sharpness of color, relative to our original image.
grainy = Image.open('Images/zoo.png')
img = np.array(grainy)
plt.imshow(show_key_colors(pick_colors(img, 5)));
Conclusion¶
I started this whole line of tinkering by trying to figure out how these color palette sites work. I was able to get a good-enough solution that handled the task quickly for small images via the K-Means algorithm. However, the webapp I was initially playing with runs considerably quicker against anything north of 400 pixels wide-- if I had to guess, they're probably doing some preliminary image shrinking like we saw in the Zootopia example.
But as I sat scratching my head, trying to figure out how to generalize this approach, I'm left with two open-ended questions:
- Is there a way we can automate the optimal number of means to search for in an image? For instance,
3 Idiotas.png
wound up giving us a nice, representative palette at 8 means.Blade Runner 2.png
only needed 3 to get the gist - If, for performance's sake, we intended to shrink the image before running K-Means, is there some rule of thumb that we can employ to ensure that we don't over-shrink and lose the sharpness/distinctness of our palette?
So on the off chance that someone reading this is well-versed in the niche corner of computer science that has these answers, feel free to hit me up with links and knowledge bombs!
Cheers,
-Nick
This post was actually derivative of a notebook that I wrote ages ago for a demo I was putting on. The link to the source code, as well as a bunch of unrelated movie posters and stabs at EDA are located, as always, on my GitHub.