Playing around with dataviz: Showing correlations

In this plot, we are looking into some ways of displaying association between (two) quantitative variables, aka correlation. Our goal is to present a rich representation of the correlation.

Let’s take the dataset flights as an example.

data(flights, package = "nycflights13")
library(tidyverse)
## Warning: package 'dplyr' was built under R version 3.5.1
library(viridis)
flights %>% 
  filter(arr_delay < 100, dep_delay < 100) %>% 
  ggplot(aes(x = dep_delay, y = arr_delay, color = origin)) +
  geom_point(alpha = .01) +
  geom_smooth(se = FALSE, color = "grey20") +
  geom_rug() +
  facet_wrap(~origin) +
  scale_color_viridis_d()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Points are not the only geom that make sense here. Let’s try some more, e.g., geom_hex().

flights %>% 
  filter(arr_delay < 100, dep_delay < 100) %>% 
  ggplot(aes(x = dep_delay, y = arr_delay)) +
  geom_hex() +
  geom_smooth(se = FALSE, color = "grey20") +
  geom_rug() +
  facet_wrap(~origin) +
  scale_color_viridis_d()
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'