Came across a fun little problem on the subreddit /r/theydidthemath that asked about the accuracy of a joke pie chart.
The top post by the time I showed up took took a quick crack at checking the pixel diameter and-- per the subreddit name-- "doing the math."
But having done something very similar to this in my last post, I figured this is a good an excuse as any to recycle some old code and write for the first time in a couple months.
Using a similar idea as the last post, we'll group all of the colors of the image into like colors. Then we'll simply divide the blue,
my birthday, by the area of the circle.
from PIL import Image, ImageDraw
I went ahead and downloaded the image from the post and trimmed it down to just include the circle.
im = Image.open('circle.png') im
And we'll stuff that into
numpy to get its per-pixel, numerical representation
arr = np.array(im) arr.shape
(430, 435, 3)
Bit of Color Finagling¶
The legend in our original image was two-tone (red/blue). But we've got a bit of a hiccup when we zoom in on locations where two colors meet. Our eyes don't notice it looking at the regular-sized image, but whatever produced this graphic did so with a bit of color fuzziness.
For instance, there are a ton of different purple-y shades at the boundary where blue meets red.
And pinks where the red meets the white.
So as mentioned up top, we'll employ the same cheeky KMeans application as before to find clusters of "like colors."
By my count, we should expect to see a:
- Red group
- Blue group
- Pink group
- White group
- Purple group
So let's load up a blank
KMeans model that anticipates finding 5 color groupings
from sklearn.cluster import KMeans model = KMeans(5)
And run it on our data
arr = arr.reshape(-1, 3) model.fit(arr);
We can then inspect what picked colors are
np.set_printoptions(precision=3, suppress=True) print(model.cluster_centers_[:, :3])
[[254.961 12.15 0.175] [254.998 254.85 254.844] [ 26.367 143.491 230.136] [254.88 157.388 152.969] [149.223 72.241 106.11 ]]
from helper import draw_rectangle draw_rectangle(model.cluster_centers_)
Then we can identify each of our points by which "Mean Color" they're closest to-- the index on the left corresponds to the order of the colors above.
import pandas as pd res = pd.Series(model.predict(arr)).value_counts() res.sort_index()
0 137612 1 47820 2 324 3 1012 4 282 dtype: int64
So if we wanted to describe "blue divided by everything not white", we'd have
image = res / (res.sum() - res)
Which works out to be about a quater of a percent of the area of the circle
image * 100
Going back to the original question, the author wanted to know how this stacked up against the actual ratio of birthdays to not birthdays in a year.
birthday = 1 / 365 birthday * 100
Not bad, yeah? The result that you get when running
KMeans is pretty random and dependent on how your machine happened to kick off the algorithm.
Had a few runs that were nearly identical. A good number that weren't. All told, though, I'd say that this image is pretty accurate.
(birthday - image) / birthday
Though someone more patient than me might consider averaging the "pct blue in the circle" over many, many images to say for certain. But I think I've rabbit-holed on this plenty long enough already :)
I hope reading my solution was at least half as amusing as it was coming up with.