Moving from R to python - 4/7 - plotly
- 1 of 7: IDE
- 2 of 7: pandas
- 3 of 7: matplotlib and seaborn
- 4 of 7: plotly
- 5 of 7: scikitlearn
- 6 of 7: advanced scikitlearn
- 7 of 7: automated machine learning
Table of Contents
plotly
plotly
can either render interactive graphs inside a jupyter notebook
or save plots as html
and open them inside a browser.
we import two plot functions from plotly.offline
, plot
for offline plotting and iplot
for online plotting
Imports
# plotly
import plotly.plotly as py
import plotly.graph_objs as go
import plotly.tools as tls
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot, iplot_mpl
# matplotlib, seaborn
import seaborn as sns
from matplotlib import cm
from matplotlib import pyplot as plt
Minimal Example
init_notebook_mode(connected = True)
trace = { 'x':[1,2], 'y':[1,2] }
data = [trace]
fig = go.Figure( data = data, layout = {} )
iplot(fig)
The code above would be sufficient to generate show an interactive java script graph in a jupyter notebook. However java script get lost when notebooks are converted first to .md
and then to .html
therefore we use an iframe to embedd the graph instead.
from IPython.display import HTML
HTML('<iframe width="900" height="800" frameborder="0" scrolling="no" src="//plot.ly/~datistics/1.embed"></iframe>')
Scatterplot
as in matplotlib we have to add each group of elements. This time we use a for loop to iterate ofer the unique species and then store the traces in the data list. For the colors we get the Brewer palettes from matplotlib importing cm
(color maps). plotly
excepts either named colors (‘red’, ‘green’, etc), HEX strings (‘#FF0000’) or rgba/rgb strings ‘rgba(255,0,0,1)’ as colors. Note that in python
3 the old %
format operator for strings is deprecated.
df = sns.load_dataset('iris')
species = list( df.species.unique() )
colors_rgba = list( cm.Dark2( range(0,len(species),1) ) )
colors_str = [ 'rgba({},{},{},{})'.format(r,g,b,a) for r,g,b,a in colors_rgba ]
data = []
for i, spec in enumerate( species, 0 ) :
df_spec = df.loc[ df['species'] == spec, : ]
trace = go.Scatter( x = df_spec.petal_length
, y = df_spec.petal_width
, mode = 'markers'
, name = spec
, marker = dict( color = colors_str[i] )
)
data.append(trace)
fig = go.Figure( data = data, layout = {} )
iplot(fig)
HTML('<iframe width="900" height="800" frameborder="0" scrolling="no" src="//plot.ly/~datistics/3.embed"></iframe>')
Convert matplotlib
object to plotly
object
In R
we can convert most ggplot2
plots easily to plotly
plots. We can do something similar in python
.
there are a few tweaks to consider which we cannot cover all here, but check out this (tutorial)[http://nbviewer.jupyter.org/github/plotly/python-user-guide/blob/master/s6_matplotlylib/s6_matplotlylib.ipynb]
Minimal Scatterplot
fig = plt.figure()
plt.plot(list( df.petal_length ), list(df.petal_width), 'o' )
init_notebook_mode()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig)
HTML('<iframe width="400" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/5.embed"></iframe>')
Scatterplot iteratively constructed
fig = plt.figure()
for i, spec in enumerate( species, 0 ) :
df_spec = df.loc[ df['species'] == spec, : ]
plt.plot( list( df_spec.petal_length ), list(df_spec.petal_width), 'o' )
init_notebook_mode()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig)
HTML('<iframe width="400" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/7.embed"></iframe>')
Scatterplot via pandas
Here we also add a legend, but the legend does not translate well to the plotly
object, which we would have to edit manually to correct it.
# old school
fig, ax = plt.subplots()
df.loc[ df['species'] == 'setosa', : ].plot.scatter('petal_length', 'petal_width', label = 'setosa', color = 'blue', ax = ax)
# functional indexing
df.query('species == "versicolor"') \
.plot.scatter( 'petal_length', 'petal_width'
, label = 'versicolor'
, color = 'orange'
, ax = ax )
df.query('species == "virginica"') \
.plot.scatter( 'petal_length', 'petal_width'
, label = 'virginica'
, color = 'green'
, ax = ax )
init_notebook_mode()
py_fig = tls.mpl_to_plotly(fig)
# remove matplotlib default styling
iplot_mpl(fig, strip_style= True)
# keep matplotlib default styling
iplot_mpl(fig, strip_style= False)
C:\anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:445: UserWarning:
Dang! That path collection is out of this world. I totally don't know what to do with it yet! Plotly can only import path collections linked to 'data' coordinates
C:\anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:481: UserWarning:
I found a path object that I don't think is part of a bar chart. Ignoring.
HTML('<iframe width="400" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/9.embed"></iframe>')
HTML('<iframe width="400" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/9.embed"></iframe>')
Scatterplot via seaborn
Nice, but we loose the legend
sns.lmplot(x = 'petal_length', y = 'petal_width', data = df
, hue = 'species'
, fit_reg = False)
init_notebook_mode()
# we need to get the figure like this for some reason
fig = plt.gcf()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig, strip_style= False)
HTML('<iframe width="400" height="400" frameborder="0" scrolling="no" src="//plot.ly/~datistics/13.embed"></iframe>')
Boxplot via seaborn
from wide format
we loose the boxes
sns.boxplot(data=df)
init_notebook_mode()
# we need to get the figure like this for some reason
fig = plt.gcf()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig, strip_style= False)
HTML('<iframe width="400" height="800" frameborder="0" scrolling="no" src="//plot.ly/~datistics/15.embed"></iframe>')
Boxplot via seaborn
from long format
we still loose the boxes
df_melt = df.melt(value_vars=['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
, id_vars = 'species')
sns.boxplot('variable', 'value', data = df_melt, hue = 'species')
init_notebook_mode()
# we need to get the figure like this for some reason
fig = plt.gcf()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig, strip_style= False)
C:\anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:481: UserWarning:
I found a path object that I don't think is part of a bar chart. Ignoring.
HTML('<iframe width="400" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/17.embed"></iframe>')
Violin plots via seaborn
Does not work
# sns.violinplot('variable', 'value', data = df_melt
# , hue = 'species'
# , inner = None ## removes inner boxes
# , zorder = 1
# )
#
# init_notebook_mode()
#
# # we need to get the figure like this for some reason
# fig = plt.gcf()
#
# py_fig = tls.mpl_to_plotly(fig)
#
# iplot_mpl(fig, strip_style= False)
Factor Plot via seaborn
ax = sns.factorplot('variable', 'value', data = df_melt
, hue = 'species'
, col = 'species'
, kind = 'box' )
ax.set_xticklabels(rotation = -45)
init_notebook_mode()
# we need to get the figure like this for some reason
fig = plt.gcf()
py_fig = tls.mpl_to_plotly(fig)
iplot_mpl(fig, strip_style= False)
C:\anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:516: UserWarning:
Looks like the annotation(s) you are trying
to draw lies/lay outside the given figure size.
Therefore, the resulting Plotly figure may not be
large enough to view the full text. To adjust
the size of the figure, use the 'width' and
'height' keys in the Layout object. Alternatively,
use the Margin object to adjust the figure's margins.
C:\anaconda3\lib\site-packages\plotly\matplotlylib\renderer.py:481: UserWarning:
I found a path object that I don't think is part of a bar chart. Ignoring.
HTML('<iframe width="900" height="300" frameborder="0" scrolling="no" src="//plot.ly/~datistics/19.embed"></iframe>')
Summary
plotly
plots look great if we use the original synthax. Converting matplotlib
objects to plotly
format is not worth it. They plotly
and the matplotlib
synthax have in common that they are quite cumbersome and that we need to use loops or very long repetitive code to pupulate the graphs. seaborn
tackles this by reducing looping and providing excellent default settings. However we will occasionally encounter glitches that we need to tackle by iteratively reconfiguring attributes of plot elements in matplotlib
.