July 27, 2024

Ever wished to design those beautiful Tableau based packed bubble charts? Follow along for a tutorial on the Matplotlib solution

Bubble chart illustrating women’s participation share at the 2020 Summer Olympics

What are packed bubble charts?

Packed bubble charts are used to display data in a cluster of circles. Each entry is displayed as an individual circle and two main variables can be used: size of the bubbles and colour.

Here are some of my favourite packed bubble charts:

Ukraine grain exports: Colour used to display country income group and bubble size as tonnes of grain.

https://medium.com/media/c0df726e467e0adebd40b56fb3e394d1/href

Essential and Frontline Workers: Colour used to define group of workers, bubble size as number of workers and clusters for industry sectors

https://medium.com/media/8920d7a84ab2298477d49d53151c8055/href

While these examples haven’t been done with Python, would it be possible to build a simpler packed bubble chart only using Matplotlib? Let’s see how:

Python code for Bubble charts

Last week, I randomly came across an example on the miscellaneous section of Matplotlib’s documentation [Link] .

Is this a hidden gem in the Matplotlib’s misc. documentation section?

Here is the example that came with the code:

Bubble chart tutorial example

How does it work?

Surprisingly frictionless to use, here is how to create your first bubble chart

#ADD YOUR DATA HERE
browser_market_share = {
‘browsers’: [‘firefox’, ‘chrome’, ‘safari’, ‘edge’, ‘ie’, ‘opera’],
‘market_share’: [8.61, 69.55, 8.36, 4.12, 2.76, 2.43],
‘color’: [‘#5A69AF’, ‘#579E65’, ‘#F9C784’, ‘#FC944A’, ‘#F24C00’, ‘#00B825’]
}
#STEP 3
bubble_chart = BubbleChart(area=browser_market_share[‘market_share’],
bubble_spacing=0.1)
#STEP 4
bubble_chart.collapse()

#STEP 5

fig, ax = plt.subplots(subplot_kw=dict(aspect=”equal”))
bubble_chart.plot(
ax, browser_market_share[‘browsers’], browser_market_share[‘color’])
ax.axis(“off”)
ax.relim()
ax.autoscale_view()
ax.set_title(‘Browser market share’)

plt.show()

1- Add your own data or use the one provided in the example. You need one variable for bubble size, another one for the labels and colours.

2- Copy&paste&run all functions in the code provided

3- Create your bubble distribution by calling BubbleChart with bubble size as variable

4- Collapse all bubbles so they are tangents to each other but without crashing

5- Create chart and add colours, labels and title.

Important:

Aspect has to be kept as equal, or your bubbles won’t be perfect circles.relim() and autoscale_view() have to be kept as well, since you can’t choose where in the grid your bubbles will appearI agree that it doesn’t look the best, especially after seeing those beautiful examples that we saw earlier.So I’ve spent a couple of days turning it into something better and here is how:

Chart customisation:

For my chart, I am using an Olympic Historical Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public domain license.

Screenshot of dataset

It contains event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Analysis) I transformed it into a dataset that details the number of female athletes in each sport/event per year. My bubble chart idea is to show which sports have a 50/50 female to male ratio athletes and how it has evolved during time.

My plotting data is composed of two different datasets, one for each year: 2020 and 1996. For each dataset I’ve computed the total sum of athletes that participated to each event (athlete_sum) and how much that sum represents compared to the number of total athletes (male + female) (difference). See a screenshot of the data below:

Screen shot of plotting dataset

This is my approach to visualise it:

Size proportion. Using radius of bubbles to compare number athletes per sport. Bigger bubbles will represent highly competitive events, such as AthleticsMulti variable interpretation. Making use of colours to represent female representation. Light green bubbles will represent events with a 50/50 split, such as Hockey.

Here is my starting point (using the code and approach from above):

First result

Some easy fixes: increasing figure size and changing labels to empty if the size isn’t over 250 to avoid having words outside bubbles.

fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(aspect=”equal”))

#Labels edited directly in dataset
Second result

Well, now at least it’s readable. But, why is Athletics pink and Boxing blue? Let’s add a legend to illustrate the relationship between colours and female representation.

Because it’s not your regular barplot chart, plt.legend() doesn’t do the trick here.

Using matplotlib Annotation Bbox we can create rectangles (or circles) to show meaning behind each colour. We can also do the same thing to show a bubble scale.

import matplotlib.pyplot as plt
from matplotlib.offsetbox import (AnnotationBbox, DrawingArea,
TextArea,HPacker)
from matplotlib.patches import Circle,Rectangle

# This is an example for one section of the legend

# Define where the annotation (legend) will be
xy = [50, 128]

# Create your colored rectangle or circle
da = DrawingArea(20, 20, 0, 0)
p = Rectangle((10 ,10),10,10,color=”#fc8d62ff”)
da.add_artist(p)

# Add text

text = TextArea(“20%”, textprops=dict(color=”#fc8d62ff”, size=14,fontweight=’bold’))

# Combine rectangle and text
vbox = HPacker(children=[da, text], align=”top”, pad=0, sep=3)

# Annotate both in a box (change alpha if you want to see the box)
ab = AnnotationBbox(vbox, xy,
xybox=(1.005, xy[1]),
xycoords=’data’,
boxcoords=(“axes fraction”, “data”),
box_alignment=(0.2, 0.5),
bboxprops=dict(alpha=0)
)
#Add to your bubble chart
ax.add_artist(ab)

I’ve also added a subtitle and a text description under the chart just by using plt.text()

And ta-da:

Final visualisation

Straightforward and user friendly interpretations of the graph:

Majority of bubbles are light green → green means 50% females → majority of Olympic competitions have an even 50/50 female to male split (yay🙌)Only one sport (Baseball), in dark green colour, has no female participation.3 sports have only female participation but the number of athletes is fairly low.The biggest sports in terms of athlete number (Swimming, Athletics and Gymnastics) are very close to having a 50/50 split

Bonus visualisation.

Comparison between 2020 and 1996 female participation share

Here I am using packed bubble charts to illustrate an additional variable: time. The chart to the left represents the Olympic games’ participation in 2020 and the one to the right the games in 1996. Putting them side to side brings on some interesting insights:

Way more bubbles on the left compared to the right → More sports in 2020 Olympics compared to the 1996 OlympicsBarely any light green bubbles in 1996 Olympics → Female participation in 1996 was much lower than in 2020 and very far from a 50/50 splitBoxing had a 0% female participation in 1996 ( dark green) and a 30% female participation in 2020 (blue).

Visualising and comparing two datasets can be complex, especially when there are three variables that have to be compared at the same time. However, packed bubble charts will captivate your audience, not only with their visual appeal but also with their ease of use and intuitive understanding.

I’ve also collected a few other customisation options that I haven’t used, check them out below.

Other customisations

Increase bubble_spacing for a more relaxed view (2 vs 0.1)

bubble_chart = BubbleChart(area=browser_market_share[‘market_share’],
bubble_spacing=2)effect of bubble spacing

It’s also possible to obtain a horizontal view only by updating def_init. The modification calculates the new x coordinates for each bubble computed with their and past bubbles radius and bubble spacing. Y coordinates are set to 0.

def __init__(self, area, bubble_spacing=0):
area = np.asarray(area)
r = np.sqrt(area / np.pi)

self.bubble_spacing = bubble_spacing
self.bubbles = np.ones((len(area), 4))
self.bubbles[:, 2] = r
self.bubbles[:, 3] = area

# UPDATE: Position the bubbles in a horizontal row, touching each other
self.bubbles[:, 0] = np.cumsum(r * 2 + self.bubble_spacing) – r – self.bubble_spacing / 2
self.bubbles[:, 1] = 0 Horizontal customisation

Potential improvements✨

I’d love to see something similar done in Plotly since this graph is screaming for some interactivity (especially for the bubbles that are too small to have a label) and maybe a slider to change years and update the graph automatically.

Final points

The matplotlib’s little gem solution for packed bubble charts will save you hours of circle drawing and has great potential of becoming a powerful alternative to Tableau. In this article, we have explored how to create them, customise them and add legends. By the end of it, we have obtained a final visualisation with visual appeal, ease of reading and that allows you to tell a story and captivate your audience.

I’ve also given you a little flavour of gender equality at the Olympic games and how fast it has improved in the past 20 years.

Fyi, Paris 2024 will be the first time in history where we will see an equal number of female/male participants

Happy coding and Olympic watching!

All images in this article are by the author

I found a hidden gem in Matplotlib’s library: Packed Bubble Charts in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

​Ever wished to design those beautiful Tableau based packed bubble charts? Follow along for a tutorial on the Matplotlib solutionBubble chart illustrating women’s participation share at the 2020 Summer OlympicsWhat are packed bubble charts?Packed bubble charts are used to display data in a cluster of circles. Each entry is displayed as an individual circle and two main variables can be used: size of the bubbles and colour.Here are some of my favourite packed bubble charts:Ukraine grain exports: Colour used to display country income group and bubble size as tonnes of grain.https://medium.com/media/c0df726e467e0adebd40b56fb3e394d1/hrefEssential and Frontline Workers: Colour used to define group of workers, bubble size as number of workers and clusters for industry sectorshttps://medium.com/media/8920d7a84ab2298477d49d53151c8055/hrefWhile these examples haven’t been done with Python, would it be possible to build a simpler packed bubble chart only using Matplotlib? Let’s see how:Python code for Bubble chartsLast week, I randomly came across an example on the miscellaneous section of Matplotlib’s documentation [Link] .Is this a hidden gem in the Matplotlib’s misc. documentation section?Here is the example that came with the code:Bubble chart tutorial exampleHow does it work?Surprisingly frictionless to use, here is how to create your first bubble chart#ADD YOUR DATA HEREbrowser_market_share = { ‘browsers’: [‘firefox’, ‘chrome’, ‘safari’, ‘edge’, ‘ie’, ‘opera’], ‘market_share’: [8.61, 69.55, 8.36, 4.12, 2.76, 2.43], ‘color’: [‘#5A69AF’, ‘#579E65’, ‘#F9C784’, ‘#FC944A’, ‘#F24C00’, ‘#00B825’]}#STEP 3bubble_chart = BubbleChart(area=browser_market_share[‘market_share’], bubble_spacing=0.1)#STEP 4bubble_chart.collapse()#STEP 5fig, ax = plt.subplots(subplot_kw=dict(aspect=”equal”))bubble_chart.plot( ax, browser_market_share[‘browsers’], browser_market_share[‘color’])ax.axis(“off”)ax.relim()ax.autoscale_view()ax.set_title(‘Browser market share’)plt.show()1- Add your own data or use the one provided in the example. You need one variable for bubble size, another one for the labels and colours.2- Copy&paste&run all functions in the code provided3- Create your bubble distribution by calling BubbleChart with bubble size as variable4- Collapse all bubbles so they are tangents to each other but without crashing5- Create chart and add colours, labels and title.Important:Aspect has to be kept as equal, or your bubbles won’t be perfect circles.relim() and autoscale_view() have to be kept as well, since you can’t choose where in the grid your bubbles will appearI agree that it doesn’t look the best, especially after seeing those beautiful examples that we saw earlier.So I’ve spent a couple of days turning it into something better and here is how:Chart customisation:For my chart, I am using an Olympic Historical Dataset from Olympedia.org which Joseph Cheng shared in Kaggle with a public domain license.Screenshot of datasetIt contains event to Athlete level Olympic Games Results from Athens 1896 to Beijing 2022. After an EDA (Exploratory Data Analysis) I transformed it into a dataset that details the number of female athletes in each sport/event per year. My bubble chart idea is to show which sports have a 50/50 female to male ratio athletes and how it has evolved during time.My plotting data is composed of two different datasets, one for each year: 2020 and 1996. For each dataset I’ve computed the total sum of athletes that participated to each event (athlete_sum) and how much that sum represents compared to the number of total athletes (male + female) (difference). See a screenshot of the data below:Screen shot of plotting datasetThis is my approach to visualise it:Size proportion. Using radius of bubbles to compare number athletes per sport. Bigger bubbles will represent highly competitive events, such as AthleticsMulti variable interpretation. Making use of colours to represent female representation. Light green bubbles will represent events with a 50/50 split, such as Hockey.Here is my starting point (using the code and approach from above):First resultSome easy fixes: increasing figure size and changing labels to empty if the size isn’t over 250 to avoid having words outside bubbles.fig, ax = plt.subplots(figsize=(12,8),subplot_kw=dict(aspect=”equal”))#Labels edited directly in datasetSecond resultWell, now at least it’s readable. But, why is Athletics pink and Boxing blue? Let’s add a legend to illustrate the relationship between colours and female representation.Because it’s not your regular barplot chart, plt.legend() doesn’t do the trick here.Using matplotlib Annotation Bbox we can create rectangles (or circles) to show meaning behind each colour. We can also do the same thing to show a bubble scale.import matplotlib.pyplot as pltfrom matplotlib.offsetbox import (AnnotationBbox, DrawingArea, TextArea,HPacker)from matplotlib.patches import Circle,Rectangle# This is an example for one section of the legend# Define where the annotation (legend) will bexy = [50, 128]# Create your colored rectangle or circleda = DrawingArea(20, 20, 0, 0)p = Rectangle((10 ,10),10,10,color=”#fc8d62ff”)da.add_artist(p)# Add text text = TextArea(“20%”, textprops=dict(color=”#fc8d62ff”, size=14,fontweight=’bold’))# Combine rectangle and textvbox = HPacker(children=[da, text], align=”top”, pad=0, sep=3)# Annotate both in a box (change alpha if you want to see the box)ab = AnnotationBbox(vbox, xy, xybox=(1.005, xy[1]), xycoords=’data’, boxcoords=(“axes fraction”, “data”), box_alignment=(0.2, 0.5), bboxprops=dict(alpha=0) )#Add to your bubble chartax.add_artist(ab)I’ve also added a subtitle and a text description under the chart just by using plt.text()And ta-da:Final visualisationStraightforward and user friendly interpretations of the graph:Majority of bubbles are light green → green means 50% females → majority of Olympic competitions have an even 50/50 female to male split (yay🙌)Only one sport (Baseball), in dark green colour, has no female participation.3 sports have only female participation but the number of athletes is fairly low.The biggest sports in terms of athlete number (Swimming, Athletics and Gymnastics) are very close to having a 50/50 splitBonus visualisation.Comparison between 2020 and 1996 female participation shareHere I am using packed bubble charts to illustrate an additional variable: time. The chart to the left represents the Olympic games’ participation in 2020 and the one to the right the games in 1996. Putting them side to side brings on some interesting insights:Way more bubbles on the left compared to the right → More sports in 2020 Olympics compared to the 1996 OlympicsBarely any light green bubbles in 1996 Olympics → Female participation in 1996 was much lower than in 2020 and very far from a 50/50 splitBoxing had a 0% female participation in 1996 ( dark green) and a 30% female participation in 2020 (blue).Visualising and comparing two datasets can be complex, especially when there are three variables that have to be compared at the same time. However, packed bubble charts will captivate your audience, not only with their visual appeal but also with their ease of use and intuitive understanding.I’ve also collected a few other customisation options that I haven’t used, check them out below.Other customisationsIncrease bubble_spacing for a more relaxed view (2 vs 0.1)bubble_chart = BubbleChart(area=browser_market_share[‘market_share’], bubble_spacing=2)effect of bubble spacingIt’s also possible to obtain a horizontal view only by updating def_init. The modification calculates the new x coordinates for each bubble computed with their and past bubbles radius and bubble spacing. Y coordinates are set to 0. def __init__(self, area, bubble_spacing=0): area = np.asarray(area) r = np.sqrt(area / np.pi) self.bubble_spacing = bubble_spacing self.bubbles = np.ones((len(area), 4)) self.bubbles[:, 2] = r self.bubbles[:, 3] = area # UPDATE: Position the bubbles in a horizontal row, touching each other self.bubbles[:, 0] = np.cumsum(r * 2 + self.bubble_spacing) – r – self.bubble_spacing / 2 self.bubbles[:, 1] = 0 Horizontal customisationPotential improvements✨I’d love to see something similar done in Plotly since this graph is screaming for some interactivity (especially for the bubbles that are too small to have a label) and maybe a slider to change years and update the graph automatically.Final pointsThe matplotlib’s little gem solution for packed bubble charts will save you hours of circle drawing and has great potential of becoming a powerful alternative to Tableau. In this article, we have explored how to create them, customise them and add legends. By the end of it, we have obtained a final visualisation with visual appeal, ease of reading and that allows you to tell a story and captivate your audience.I’ve also given you a little flavour of gender equality at the Olympic games and how fast it has improved in the past 20 years.Fyi, Paris 2024 will be the first time in history where we will see an equal number of female/male participantsHappy coding and Olympic watching!All images in this article are by the authorI found a hidden gem in Matplotlib’s library: Packed Bubble Charts in Python was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.  data-analysis, python, matplotlib, data-visualization, data-science Towards Data Science – MediumRead More

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

FavoriteLoadingAdd to favorites
July 27, 2024

Recent Posts

0 Comments

Submit a Comment