July 25, 2016

A lot of times your first step in exploratory data analysis will be to build a pairwise scatterplot. Plotting each feature against all others is very useful in order to get a general picture and to identify initial analysis directions. Here is an example of such a plot for a well-known iris dataset:

In general, such plots become useless when there are a lot of dimensions, as it should become very large in size or tiles will be too small to see anything useful.

Here is an alternative approach to achieve a similar effect. It comes with the Python Bokeh library for interactive visualizations. Some basics things it brings is the ability to navigate and zoom plots. Instead of presenting the plot as a static image, it gives you HTML document that you can open in a web browser or embed to a Jupyter Notebook. There is also a gallery to see what can be done with Bokeh.

In particular, check out this example (view: link, code on github: link), here by selecting different dimensions, you can explore the Auto MPG Data Set. Only slight modifications are required to run this code on your own dataset, and nothing stops you from adding extra features.

Demo for Iris Dataset:


Isn’t the result similar to pairwise scatterplot? We don’t see every possible dimension combination at once, but changing them is a matter of clicks rather than code. Also, it brings the power to show additional dimensions with data point color and size, helping you to pick best representations for your report.

A downside here is that a standalone HTML document is not enough, as such visualizations require client-server interactions. So to run it locally, one needs to start a Bokeh server (described in user guide and pretty straightforward).

R users can do pretty much the same with Shiny.


Previous blogs

Successes and Failures in AI Build vs. Buy Decisions

April 10, 2024
Read more

Developing a Comprehensive AI Strategy for Competitive Advantage

Read more

Let’s Talk