Plot for mean comparison

Load packages

library(tidyverse)
library(reshape2)  # for data
library(mosaic)
library(sjmisc)
library(skimr)

Data setup

data(tips)

Aggregate data per group

tips_aggr <- tips %>% 
  group_by(smoker) %>% 
  summarise(tip_avg = mean(tip),
            tip_md = median(tip),
            tip_sd = sd(tip),
            tip_iqr = IQR(tip))

tips_aggr
#> # A tibble: 2 x 5
#>   smoker tip_avg tip_md tip_sd tip_iqr
#>   <fct>    <dbl>  <dbl>  <dbl>   <dbl>
#> 1 No        2.99   2.74   1.38    1.50
#> 2 Yes       3.01   3      1.40    1.68

The same lines, more concisely:

tips_descr <- tips %>% 
  group_by(smoker) %>% 
  descr(tip)

tips_descr
#> 
#> ## Basic descriptive statistics
#> 
#> 
#> Grouped by: No
#> 
#>  var    type label   n NA.prc mean   sd   se   md trimmed   range   iqr skew
#>  tip numeric   tip 151      0 2.99 1.38 0.11 2.74    2.83 8 (1-9) 1.505 1.32
#> 
#> 
#> Grouped by: Yes
#> 
#>  var    type label  n NA.prc mean  sd   se md trimmed    range  iqr skew
#>  tip numeric   tip 93      0 3.01 1.4 0.15  3    2.86 9 (1-10) 1.68 1.72

descr handles back a list, which may be not practical for further processing.

skim provides another alternative:

tips_skim <- tips %>% 
  group_by(smoker) %>% 
  skim(tip)

tips_skim
Table 1: Data summary
Name Piped data
Number of rows 244
Number of columns 7
_______________________
Column type frequency:
numeric 1
________________________
Group variables smoker

Variable type: numeric

skim_variable smoker n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
tip No 0 1 2.99 1.38 1 2 2.74 3.50 9 ▇▆▂▁▁
tip Yes 0 1 3.01 1.40 1 2 3.00 3.68 10 ▇▇▁▁▁

This function gives back a tidy data frame. Nice.

Alternative, using mosaic:

tips_fav <- tips %>% 
  favstats(tip ~ smoker, data = .)
tips_fav
#>   smoker min Q1 median    Q3 max     mean       sd   n missing
#> 1     No   1  2   2.74 3.505   9 2.991854 1.377190 151       0
#> 2    Yes   1  2   3.00 3.680  10 3.008710 1.401468  93       0

Plot 1

ggplot(tips_skim) +
  aes(x = smoker, y = numeric.mean) +
  geom_line(group = 1) +
  geom_pointrange(aes(ymin = numeric.mean - numeric.sd,
                    ymax = numeric.mean + numeric.sd),
                  color = "grey40") +
  geom_point(size = 5) +
  ylim(0, 5) +
  labs(caption = "Error bars represent standard deviation",
       y = "average tip")
  

List to data frame

tips_descr gives us a list, but more often that not, we would like to go on using a tibble. That’s what’s enframe is for. Subsequently, we can make use of unnest to unnest the list-column value.

tips_descr %>% 
  enframe() %>% 
  unnest(value)  # that's the name of the list-column to be unnested
#> # A tibble: 2 x 14
#>    name var   type  label     n NA.prc  mean    sd    se    md trimmed range
#>   <int> <chr> <chr> <chr> <int>  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>
#> 1     1 tip   nume… tip     151      0  2.99  1.38 0.112  2.74    2.83 8 (1…
#> 2     2 tip   nume… tip      93      0  3.01  1.40 0.145  3       2.86 9 (1…
#> # … with 2 more variables: iqr <dbl>, skew <dbl>