Scatter Plots with Bokeh

I will showing in this notebook how can you make a scatter plot using Bokeh with Python. I will be using also the decathlon dataset that can be found the FactoMineR R package

Let's first import the data using Panda Library

In [1]:
import pandas as pd
In [2]:
decathlon = pd.read_csv("decathlon.csv")
In [3]:
decathlon.shape
Out[3]:
(41, 14)
In [4]:
decathlon.head()
Out[4]:
Athlets 100m Long.jump Shot.put High.jump 400m 110m.hurdle Discus Pole.vault Javeline 1500m Rank Points Competition
0 SEBRLE 11.04 7.58 14.83 2.07 49.81 14.69 43.75 5.02 63.19 291.7 1 8217 Decastar
1 CLAY 10.76 7.40 14.26 1.86 49.37 14.05 50.72 4.92 60.15 301.5 2 8122 Decastar
2 KARPOV 11.02 7.30 14.77 2.04 48.37 14.09 48.95 4.92 50.31 300.2 3 8099 Decastar
3 BERNARD 11.02 7.23 14.25 1.92 48.93 14.99 40.87 5.32 62.77 280.1 4 8067 Decastar
4 YURKOV 11.34 7.09 15.19 2.10 50.42 15.31 46.26 4.72 63.44 276.4 5 8036 Decastar

We import the necessarly libraries

In [5]:
decathlon.Javeline.head()
Out[5]:
0    63.19
1    60.15
2    50.31
3    62.77
4    63.44
Name: Javeline, dtype: float64
In [6]:
decathlon.Discus.head()
Out[6]:
0    43.75
1    50.72
2    48.95
3    40.87
4    46.26
Name: Discus, dtype: float64
In [7]:
from bokeh.io import push_notebook, show, output_notebook
from bokeh.layouts import row 
from bokeh.plotting import figure

output_notebook()
Loading BokehJS ...
In [8]:
p = figure(title = "Decathlon: Discus x Javeline")
p.circle('Discus','Javeline',source=decathlon,fill_alpha=0.2, size=10)
Out[8]:
GlyphRenderer(
id = '1042', …)
In [9]:
show(p)

I will now customize the scatter. I will be coloring the bubbles according to the categorical variable Competition. We need to import factor_cmap. It will be used to map the colors according to the levels of Competition.

In [10]:
from bokeh.transform import factor_cmap
In [11]:
decathlon.Competition.unique()
Out[11]:
array(['Decastar', 'OlympicG'], dtype=object)
In [12]:
index_cmap = factor_cmap('Competition', palette=['red', 'blue'], 
                         factors=sorted(decathlon.Competition.unique()))
In [13]:
p = figure(plot_width=600, plot_height=450, title = "Decathlon: Discus x Javeline")
p.scatter('Discus','Javeline',source=decathlon,fill_alpha=0.6, fill_color=index_cmap,size=10,legend='Competition')
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
p.legend.location = "top_left"
In [14]:
show(p)

I will be adding an Hoover to the bubbles. It means that a pop-up will show off when we click on the bulble. I will make the name of the

In [15]:
decathlon['Athlets'].head()
Out[15]:
0     SEBRLE
1       CLAY
2     KARPOV
3    BERNARD
4     YURKOV
Name: Athlets, dtype: object
In [16]:
p = figure(plot_width=600, plot_height=450, title = "Decathlon: Discus x Javeline",toolbar_location=None,
          tools="hover", tooltips="@Athlets: (@Discus,@Javeline)")
p.scatter('Discus','Javeline',source=decathlon,fill_alpha=0.6, fill_color=index_cmap,size=10,legend='Competition')
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
p.legend.location = "top_left"
show(p)

We will be adding now the names of the Athlets in the graph. We need first to import some functions and transform decathlon data to a ColumnDataSource object.

In [17]:
from bokeh.models import  ColumnDataSource,Range1d, LabelSet, Label

decath=ColumnDataSource(data=decathlon)

We draw then the scatter plot

In [18]:
p = figure(plot_width=700, plot_height=450, title = "Decathlon: Discus x Javeline")
p.scatter('Discus','Javeline',source=decath,fill_alpha=0.6, fill_color=index_cmap,size=10,legend='Competition')
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
p.legend.location = "top_left"

We add then the labels

In [19]:
labels = LabelSet(x='Discus', y='Javeline', text='Athlets', level='glyph',text_font_size='9pt',
              text_color=index_cmap,x_offset=5, y_offset=5, source=decath, render_mode='canvas')

p.add_layout(labels)
In [20]:
show(p)

We will be adding Box Annotations from top to bottom or left to right

In [21]:
from bokeh.models import BoxAnnotation

p = figure(plot_width=700, plot_height=450, title = "Decathlon: Discus x Javeline")
p.scatter('Discus','Javeline',source=decath,fill_alpha=0.6, fill_color=index_cmap,size=10,legend='Competition')
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
p.legend.location = "top_left"
In [22]:
low_box = BoxAnnotation(top=55, fill_alpha=0.1, fill_color='red')
mid_box = BoxAnnotation(bottom=55, top=65, fill_alpha=0.1, fill_color='green')
high_box = BoxAnnotation(bottom=65, fill_alpha=0.1, fill_color='red')
In [23]:
p.add_layout(low_box)
p.add_layout(mid_box)
p.add_layout(high_box)
In [24]:
p.xgrid[0].grid_line_color=None
p.ygrid[0].grid_line_alpha=0.5
show(p)

We can make an interactive legend too

In [25]:
p = figure(plot_width=600, plot_height=450, title = "Decathlon: Discus x Javeline")
p.title.text = 'Click on legend entries to hide the corresponding lines'
In [26]:
decathlon.loc[(decathlon.Competition=='OlympicG')].head()
Out[26]:
Athlets 100m Long.jump Shot.put High.jump 400m 110m.hurdle Discus Pole.vault Javeline 1500m Rank Points Competition
13 Sebrle 10.85 7.84 16.36 2.12 48.36 14.05 48.72 5.0 70.52 280.01 1 8893 OlympicG
14 Clay 10.44 7.96 15.23 2.06 49.19 14.13 50.11 4.9 69.71 282.00 2 8820 OlympicG
15 Karpov 10.50 7.81 15.93 2.09 46.81 13.97 51.65 4.6 55.54 278.11 3 8725 OlympicG
16 Macey 10.89 7.47 15.73 2.15 48.97 14.56 48.34 4.4 58.46 265.42 4 8414 OlympicG
17 Warners 10.62 7.74 14.48 1.97 47.97 14.01 43.73 4.9 55.39 278.05 5 8343 OlympicG
In [27]:
x=['OlympicG','Decastar']
In [28]:
x
Out[28]:
['OlympicG', 'Decastar']
In [29]:
for i in x:
    df=decathlon.loc[(decathlon.Competition==i)]
    p.scatter('Discus','Javeline',source=df,fill_alpha=0.6, fill_color=index_cmap,size=10,legend='Competition')

p.legend.location = "top_left"
p.legend.click_policy="hide"
show(p)

    

We will now color the circles according to a continuous variable. Let's say we will the Points variable

In [30]:
from bokeh.models import ColumnDataSource, ColorBar
from bokeh.palettes import Spectral6
from bokeh.transform import linear_cmap
In [31]:
Spectral6
Out[31]:
['#3288bd', '#99d594', '#e6f598', '#fee08b', '#fc8d59', '#d53e4f']
In [32]:
mapper = linear_cmap(field_name='Points', palette=Spectral6 ,low=min(decathlon['Points']) ,high=max(decathlon['Points']))
In [ ]:
 
In [33]:
p = figure(plot_width=700, plot_height=450, title = "Decathlon: Discus x Javeline")
p.scatter('Discus','Javeline',source=decath,fill_alpha=0.6, line_color=mapper,color=mapper,size=10)
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'

Adding the legend

In [34]:
color_bar = ColorBar(color_mapper=mapper['transform'], width=8,  location=(0,0),title="Points")

p.add_layout(color_bar, 'right')
In [35]:
show(p)
In [36]:
p = figure(plot_width=700, plot_height=450, title = "Decathlon: Discus x Javeline")
p.scatter('Discus','Javeline',source=decath,fill_alpha=0.6, line_color=mapper,color=mapper,size='Points')
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
In [37]:
show(p)

We need then to do this in order to control the size of the bubbles in the scatter plot

In [38]:
from bokeh.models import LinearInterpolator
In [39]:
decath
Out[39]:
ColumnDataSource(
id = '1319', …)
In [40]:
size_mapper=LinearInterpolator(
    x=[decathlon.Points.min(),decathlon.Points.max()],
    y=[5,50]
)
In [41]:
p = figure(plot_width=700, plot_height=450, title = "Decathlon: Discus x Javeline",
          toolbar_location=None,
          tools="hover", tooltips="@Athlets: @Points")
p.scatter('Discus','Javeline',
          source=decathlon,
          fill_alpha=0.6, 
          fill_color=index_cmap,
          size={'field':'Points','transform': size_mapper},
          legend='Competition'
         )
p.xaxis.axis_label = 'Discus'
p.yaxis.axis_label = 'Javeline'
p.legend.location = "top_left"
In [42]:
show(p)
In [ ]: