1.7 Data visualization

Anscombe’s quartet

Anscombe’s quartet

All four data sets share the same following statistics:

Anscombe’s quartet

Heights (in feet) of 216 volcanoes


\[19882, 19728, 19335, 19287, \cdots, 617, 555, 529, 242\]

volcano-heights.csv

Histogram

  • Count: The number of data points in a bin

Histogram

\[\text{percent (or relative frequency)}=\frac{\text{frequency}}{\text{total # of data points}}\]

Plotting with Python

Go to Google Colab and sign in. Open a new notebook.

import numpy
import pandas

url = "https://imse317.github.io/lecture-slides/ch01/data/bank.csv"
bank = pandas.read_csv(url)
bank.head()


import seaborn.objects as so
(
    so.Plot(bank, x='age')
    .add(so.Bars(), so.Hist(binrange=(0, 100), binwidth=5, stat='percent'))
)