Logistic regression using z-standardized values

1 Load packages

library(tidyverse)  # data wrangling
library(easystats)

2 Data

data(mtcars)

3 Motivation

In this post, we’ll investigate the consequence of z-standardizing the predictor variables, and in addition the outcome variable in a simple logistic regression setting.

Do some coefficients change as a result of standardizing the values?

4 EDA

mtcars |> 
  group_by(am) |> 
  summarise(mpg_Avg = mean(mpg))
#> # A tibble: 2 × 2
#>      am mpg_Avg
#>   <dbl>   <dbl>
#> 1     0    17.1
#> 2     1    24.4

As we can see, am=1, i.e., manual (gear shifting) cars have a better mpg value.

5 Model with raw values

mod_raw <- glm(am ~ mpg, data = mtcars, family = "binomial")
parameters(mod_raw, exponentiate = TRUE)
#> Parameter   | Odds Ratio |       SE |       95% CI |     z |     p
#> ------------------------------------------------------------------
#> (Intercept) |   1.36e-03 | 3.19e-03 | [0.00, 0.06] | -2.81 | 0.005
#> mpg         |       1.36 |     0.16 | [1.13, 1.80] |  2.67 | 0.008

The odds ratio of 1.36 means that for every one-unit increase in mpg, the odds of a car having an manual transmission increase by 36%.

Note that the logistic regression (in R) models the second level of the outcome variable (see here for more information).

6 Model with am as factor-Variable

mtcars <-
  mtcars |> 
  mutate(am_f = factor(am))

levels(mtcars$am_f)
#> [1] "0" "1"
mod_raw_f <- glm(am ~ mpg, data = mtcars, family = "binomial")
parameters(mod_raw, exponentiate = TRUE)
#> Parameter   | Odds Ratio |       SE |       95% CI |     z |     p
#> ------------------------------------------------------------------
#> (Intercept) |   1.36e-03 | 3.19e-03 | [0.00, 0.06] | -2.81 | 0.005
#> mpg         |       1.36 |     0.16 | [1.13, 1.80] |  2.67 | 0.008

Identical!

7 Visualizing

pred_df <-
  tibble(
    mpg = seq(min(mtcars$mpg), max(mtcars$mpg), by = .1),
    am_pred = predict(mod_raw, type = "response", newdata = tibble(mpg))
  )

ggplot(mtcars) +
  aes(x = mpg, y = am) +
  geom_point() +
  geom_line(data = pred_df, aes(x = mpg, y = am_pred), color = "blue") +
  labs(title = "Predicting manual gear shifting",
       subtitle = "Logistic model")

8 Standardizing predictors

mtcars_z <- 
mtcars |> 
  mutate(across(c(everything(),-am), ~standardize(.x)))

9 Model with z-scaled predictors

mod_z <- glm(am ~ mpg, data = mtcars_z, family = "binomial")
parameters(mod_z, exponentiate = TRUE)
#> Parameter   | Odds Ratio |   SE |        95% CI |     z |     p
#> ---------------------------------------------------------------
#> (Intercept) |       0.65 | 0.29 | [0.25,  1.58] | -0.96 | 0.338
#> mpg         |       6.36 | 4.40 | [2.09, 34.49] |  2.67 | 0.008

10 Model with all variables z-scaled

Note that it makes no sense to z-scale the outcome variable of a logistic regression.

11 Conclusion

As can be seen the Odds ratio gets really big after standardization.

12 Reproducibility

#> ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.1 (2022-06-23)
#>  os       macOS Big Sur ... 10.16
#>  system   x86_64, darwin17.0
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Berlin
#>  date     2023-12-20
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
#>  package      * version date (UTC) lib source
#>  bayestestR   * 0.13.1  2023-04-07 [1] CRAN (R 4.2.0)
#>  blogdown       1.18    2023-06-19 [1] CRAN (R 4.2.0)
#>  bookdown       0.36    2023-10-16 [1] CRAN (R 4.2.0)
#>  bslib          0.5.1   2023-08-11 [1] CRAN (R 4.2.0)
#>  cachem         1.0.8   2023-05-01 [1] CRAN (R 4.2.0)
#>  callr          3.7.3   2022-11-02 [1] CRAN (R 4.2.0)
#>  cli            3.6.1   2023-03-23 [1] CRAN (R 4.2.0)
#>  coda           0.19-4  2020-09-30 [1] CRAN (R 4.2.0)
#>  codetools      0.2-19  2023-02-01 [1] CRAN (R 4.2.0)
#>  colorout     * 1.3-0   2023-11-08 [1] Github (jalvesaq/colorout@8384882)
#>  colorspace     2.1-0   2023-01-23 [1] CRAN (R 4.2.0)
#>  correlation  * 0.8.4   2023-04-06 [1] CRAN (R 4.2.1)
#>  crayon         1.5.2   2022-09-29 [1] CRAN (R 4.2.1)
#>  datawizard   * 0.9.0   2023-09-15 [1] CRAN (R 4.2.0)
#>  devtools       2.4.5   2022-10-11 [1] CRAN (R 4.2.1)
#>  digest         0.6.33  2023-07-07 [1] CRAN (R 4.2.0)
#>  dplyr        * 1.1.3   2023-09-03 [1] CRAN (R 4.2.0)
#>  easystats    * 0.7.0   2023-11-05 [1] CRAN (R 4.2.1)
#>  effectsize   * 0.8.6   2023-09-14 [1] CRAN (R 4.2.0)
#>  ellipsis       0.3.2   2021-04-29 [1] CRAN (R 4.2.0)
#>  emmeans        1.8.9   2023-10-17 [1] CRAN (R 4.2.0)
#>  estimability   1.4.1   2022-08-05 [1] CRAN (R 4.2.0)
#>  evaluate       0.21    2023-05-05 [1] CRAN (R 4.2.0)
#>  fansi          1.0.5   2023-10-08 [1] CRAN (R 4.2.0)
#>  fastmap        1.1.1   2023-02-24 [1] CRAN (R 4.2.0)
#>  forcats      * 1.0.0   2023-01-29 [1] CRAN (R 4.2.0)
#>  fs             1.6.3   2023-07-20 [1] CRAN (R 4.2.0)
#>  generics       0.1.3   2022-07-05 [1] CRAN (R 4.2.0)
#>  ggplot2      * 3.4.4   2023-10-12 [1] CRAN (R 4.2.0)
#>  glue           1.6.2   2022-02-24 [1] CRAN (R 4.2.0)
#>  gtable         0.3.4   2023-08-21 [1] CRAN (R 4.2.0)
#>  hms            1.1.3   2023-03-21 [1] CRAN (R 4.2.0)
#>  htmltools      0.5.6.1 2023-10-06 [1] CRAN (R 4.2.0)
#>  htmlwidgets    1.6.2   2023-03-17 [1] CRAN (R 4.2.0)
#>  httpuv         1.6.11  2023-05-11 [1] CRAN (R 4.2.0)
#>  insight      * 0.19.7  2023-11-26 [1] CRAN (R 4.2.1)
#>  jquerylib      0.1.4   2021-04-26 [1] CRAN (R 4.2.0)
#>  jsonlite       1.8.7   2023-06-29 [1] CRAN (R 4.2.0)
#>  knitr          1.45    2023-10-30 [1] CRAN (R 4.2.1)
#>  later          1.3.1   2023-05-02 [1] CRAN (R 4.2.0)
#>  lattice        0.21-8  2023-04-05 [1] CRAN (R 4.2.0)
#>  lifecycle      1.0.4   2023-11-07 [1] CRAN (R 4.2.1)
#>  lubridate    * 1.9.3   2023-09-27 [1] CRAN (R 4.2.0)
#>  magrittr       2.0.3   2022-03-30 [1] CRAN (R 4.2.0)
#>  MASS           7.3-60  2023-05-04 [1] CRAN (R 4.2.0)
#>  Matrix         1.5-4.1 2023-05-18 [1] CRAN (R 4.2.0)
#>  memoise        2.0.1   2021-11-26 [1] CRAN (R 4.2.0)
#>  mime           0.12    2021-09-28 [1] CRAN (R 4.2.0)
#>  miniUI         0.1.1.1 2018-05-18 [1] CRAN (R 4.2.0)
#>  modelbased   * 0.8.6   2023-01-13 [1] CRAN (R 4.2.1)
#>  multcomp       1.4-25  2023-06-20 [1] CRAN (R 4.2.0)
#>  munsell        0.5.0   2018-06-12 [1] CRAN (R 4.2.0)
#>  mvtnorm        1.2-2   2023-06-08 [1] CRAN (R 4.2.0)
#>  parameters   * 0.21.3  2023-11-02 [1] CRAN (R 4.2.1)
#>  performance  * 0.10.8  2023-10-30 [1] CRAN (R 4.2.1)
#>  pillar         1.9.0   2023-03-22 [1] CRAN (R 4.2.0)
#>  pkgbuild       1.4.0   2022-11-27 [1] CRAN (R 4.2.0)
#>  pkgconfig      2.0.3   2019-09-22 [1] CRAN (R 4.2.0)
#>  pkgload        1.3.2.1 2023-07-08 [1] CRAN (R 4.2.0)
#>  prettyunits    1.1.1   2020-01-24 [1] CRAN (R 4.2.0)
#>  processx       3.8.2   2023-06-30 [1] CRAN (R 4.2.0)
#>  profvis        0.3.8   2023-05-02 [1] CRAN (R 4.2.0)
#>  promises       1.2.1   2023-08-10 [1] CRAN (R 4.2.0)
#>  ps             1.7.5   2023-04-18 [1] CRAN (R 4.2.0)
#>  purrr        * 1.0.2   2023-08-10 [1] CRAN (R 4.2.0)
#>  R6             2.5.1   2021-08-19 [1] CRAN (R 4.2.0)
#>  Rcpp           1.0.11  2023-07-06 [1] CRAN (R 4.2.0)
#>  readr        * 2.1.4   2023-02-10 [1] CRAN (R 4.2.0)
#>  remotes        2.4.2.1 2023-07-18 [1] CRAN (R 4.2.0)
#>  report       * 0.5.8   2023-12-07 [1] CRAN (R 4.2.1)
#>  rlang          1.1.1   2023-04-28 [1] CRAN (R 4.2.0)
#>  rmarkdown      2.25    2023-09-18 [1] CRAN (R 4.2.0)
#>  rstudioapi     0.15.0  2023-07-07 [1] CRAN (R 4.2.0)
#>  sandwich       3.0-2   2022-06-15 [1] CRAN (R 4.2.0)
#>  sass           0.4.7   2023-07-15 [1] CRAN (R 4.2.0)
#>  scales         1.2.1   2022-08-20 [1] CRAN (R 4.2.0)
#>  see          * 0.8.1   2023-11-03 [1] CRAN (R 4.2.1)
#>  sessioninfo    1.2.2   2021-12-06 [1] CRAN (R 4.2.0)
#>  shiny          1.8.0   2023-11-17 [1] CRAN (R 4.2.1)
#>  stringi        1.7.12  2023-01-11 [1] CRAN (R 4.2.0)
#>  stringr      * 1.5.1   2023-11-14 [1] CRAN (R 4.2.1)
#>  survival       3.5-5   2023-03-12 [1] CRAN (R 4.2.0)
#>  TH.data        1.1-2   2023-04-17 [1] CRAN (R 4.2.0)
#>  tibble       * 3.2.1   2023-03-20 [1] CRAN (R 4.2.0)
#>  tidyr        * 1.3.0   2023-01-24 [1] CRAN (R 4.2.0)
#>  tidyselect     1.2.0   2022-10-10 [1] CRAN (R 4.2.0)
#>  tidyverse    * 2.0.0   2023-02-22 [1] CRAN (R 4.2.0)
#>  timechange     0.2.0   2023-01-11 [1] CRAN (R 4.2.0)
#>  tzdb           0.4.0   2023-05-12 [1] CRAN (R 4.2.0)
#>  urlchecker     1.0.1   2021-11-30 [1] CRAN (R 4.2.0)
#>  usethis        2.2.2   2023-07-06 [1] CRAN (R 4.2.0)
#>  utf8           1.2.3   2023-01-31 [1] CRAN (R 4.2.0)
#>  vctrs          0.6.4   2023-10-12 [1] CRAN (R 4.2.0)
#>  withr          2.5.2   2023-10-30 [1] CRAN (R 4.2.1)
#>  xfun           0.40    2023-08-09 [1] CRAN (R 4.2.0)
#>  xtable         1.8-4   2019-04-21 [1] CRAN (R 4.2.0)
#>  yaml           2.3.7   2023-01-23 [1] CRAN (R 4.2.0)
#>  zoo            1.8-12  2023-04-13 [1] CRAN (R 4.2.0)
#> 
#>  [1] /Users/sebastiansaueruser/Rlibs
#>  [2] /Library/Frameworks/R.framework/Versions/4.2/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────