Colleagues Don’t Let Colleagues Make Bad Graphs

Benedikt Schnur

Hannover Medical School (MHH), Department of Human Genetics

Presentation based on

Friends Don’t Let Friends Make Bad Graphs [1]

Source code available at GitHub

Colleagues Don’t Let Colleagues Make Pie Charts [2][5]

Code
data1 <- data.frame(name = letters[1:5], value = c(17, 18, 20, 22, 24))
data2 <- data.frame(name = letters[1:5], value = c(20, 18, 21, 20, 20))
data3 <- data.frame(name = letters[1:5], value = c(24, 23, 21, 19, 18))

plot_pie <- function(data) {
  ggplot(data, aes(x = "name", y = value, fill = name)) +
    geom_bar(width = 1, stat = "identity") +
    coord_polar("y", start = 0, direction = -1) +
    bene_palette_shifted +
    geom_text(aes(label = name), position = position_stack(vjust = 0.5), color = "white") +
    xlab("") +
    ylab("")
}

a <- plot_pie(data1)
b <- plot_pie(data2)
c <- plot_pie(data3)
a + b + c + patchwork_defauts

Figure 1: Pie Charts

Colleagues Don’t Let Colleagues Make Pie Charts

Code
# reuses plots of fig-pies

# add source column to each data frame
data1 <- data1 %>%
  add_column(type = "I")
data2 <- data2 %>%
  add_column(type = "II")
data3 <- data3 %>%
  add_column(type = "III")

# combine data frames into a single data frame
data_combined <- rbind(data1, data2, data3)

g <- ggplot(data_combined, aes(x = rev(type), y = value, fill = name)) +
  geom_bar(position = position_fill(reverse = TRUE), stat = "identity", width = 0.25, color = "black") +
  bene_palette_shifted +
  theme(
    legend.position = "none",
    panel.grid = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) +
  scale_x_discrete(expand = c(0, 0.4)) +
  geom_text(aes(label = name), position = position_fill(vjust = 0.5, reverse = TRUE), color = "white") +
  xlab("") +
  ylab("") +
  coord_flip()

layout <- "
AD
BD
CD
"

a + b + c + g + plot_layout(design = layout) + patchwork_defauts

Figure 2: Pie Charts vs. Bar Charts (stacked)

Colleagues Don’t Let Colleagues Make Pie Charts

Code
# reuses plots of fig-pies

plot_bar <- function(data) {
  ggplot(data, aes(x = name, y = value, fill = name)) +
    geom_bar(stat = "identity", color = "black") +
    bene_palette_shifted +
    geom_text(aes(label = name), position = position_stack(reverse = TRUE), vjust = 1.5, color = "white") +
    ylim(0, 25) +
    xlab("") +
    ylab("")
}

d <- plot_bar(data1)
e <- plot_bar(data2)
f <- plot_bar(data3)

layout <- "
ABC
ABC
DEF
DEF
DEF
DEF
DEF
"

a + b + c + d + e + f + plot_layout(design = layout) + patchwork_defauts

Figure 3: Pie Charts vs. Bar Charts

Colleagues Don’t Let Colleagues Make Bar Plots for Means Separation [6]

Code
bene_palette_shifted <- scale_fill_manual(values = bene_colors[-1], breaks = waiver())

# group1 is sampled from a normal distribution with mean = 1 and sd = 1.
# group2 is sampled from a lognormal distribution with mean = 1 and sd = 1.
set.seed(2363)
group1 <- rnorm(n = 250, mean = 1, sd = 1)
group2 <- rlnorm(
  n = 250,
  meanlog = log(1^2 / sqrt(1^2 + 1^2)),
  sdlog = sqrt(log(1 + (1^2 / 1^2)))
)

groups_long <- cbind(
  group1,
  group2
) %>%
  as.data.frame() %>%
  gather("group", "response", 1:2)

bar <- groups_long %>%
  ggplot(aes(x = group, y = response)) +
  geom_bar(stat = "summary", fun = mean, aes(fill = group), width = 0.5, colour = "black") +
  stat_summary(
    geom = "errorbar", fun.data = "mean_se",
    width = 0.1, size = 1
  ) +
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.1))) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none",
    plot.title = element_text(size = 10),
    plot.caption = element_text(hjust = 0),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    axis.title.x = element_blank()
  ) +
  labs(
    x = "Group",
    y = "Response"
  ) +
  ggtitle(
    paste0(
      "group1: mean = ", signif(mean(group1), 2),
      "; sd = ", signif(sd(group1), 2), "\n",
      "group2: mean = ", signif(mean(group2), 2),
      "; sd = ", signif(sd(group2), 2)
    )
  )

bar

Figure 4: Means separation bar plot

Colleagues Don’t Let Colleagues Make Bar Plots for Means Separation

Code
box <- groups_long %>%
  ggplot(aes(x = group, y = response)) +
  geom_boxplot(aes(fill = group), width = 0.5) +
  labs(
    y = "Response"
  ) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none",
    plot.title = element_text(size = 10),
    plot.caption = element_text(hjust = 0),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    axis.title.x = element_blank()
  ) +
  ggtitle(
    paste0(
      "group1: median = ", signif(median(group1), 2),
      "; IQR = ", signif(IQR(group1), 2), "\n",
      "group2: median = ", signif(median(group2), 2),
      "; IQR = ", signif(IQR(group2), 2)
    )
  )

beeswarm <- groups_long %>%
  ggplot(aes(x = group, y = response)) +
  geom_beeswarm(alpha = 0.8, size = 2, shape = 21, color = "black", aes(fill = group), cex = 1.7, method = "compactswarm") +
  bene_palette_shifted +
  labs(
    x = "Group",
    y = "Response"
  ) +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    legend.position = "none",
    plot.title = element_text(size = 10),
    plot.caption = element_text(hjust = 0),
    axis.line.x = element_line(color = "black"),
    axis.line.y = element_line(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    axis.title.x = element_blank()
  ) +
  ggtitle(
    paste0(
      "group1: median = ", signif(median(group1), 2),
      "; IQR = ", signif(IQR(group1), 2), "\n",
      "group2: median = ", signif(median(group2), 2),
      "; IQR = ", signif(IQR(group2), 2)
    )
  )

bar + box + beeswarm + patchwork_defauts

Figure 5: Means separation alternatives: box plot and beeswarm plot

Colleagues Don’t Let Colleagues Make Violin Plots for Small Sample Sizes

Code
# Generate three sets of data from a normal distribution with mean = 1 and sd = 1.

set.seed(666)

data_set <- replicate(rnorm(5, mean = 1, sd = 1), n = 3, simplify = T) %>%
  as.data.frame() %>%
  gather("sample", "Response", 1:3) %>%
  mutate(Group = str_replace(sample, "V", "group"))

violin_eg <- data_set %>%
  ggplot(aes(x = Group, y = Response)) +
  geom_violin(aes(fill = Group), trim = FALSE) +
  stat_summary(geom = "point", fun = median) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    legend.position = "none",
    axis.title.x = element_blank()
  )

violin_eg

Figure 6: Violin plots for small sample sizes (dot at median)

Colleagues Don’t Let Colleagues Make Violin Plots for Small Sample Sizes

Code
box_eg <- data_set %>%
  ggplot(aes(x = Group, y = Response)) +
  geom_boxplot(aes(fill = Group)) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    legend.position = "none",
    axis.title.x = element_blank()
  )

violin_eg + box_eg

Figure 7: Violin plots and box plots for small sample sizes

Colleagues Don’t Let Colleagues Make Violin Plots for Small Sample Sizes

Code
jitter_eg <- data_set %>%
  ggplot(aes(x = Group, y = Response)) +
  geom_point(aes(fill = Group),
    shape = 21, size = 3, color = "black",
    position = position_jitter(seed = 1, width = 0.2)
  ) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    panel.grid.major.y = element_line(color = "grey"),
    legend.position = "none",
    axis.title.x = element_blank()
  )


violin_eg + box_eg + jitter_eg

Figure 8: Violin plots, box plots and jitter plots for small sample sizes (n = 5)

Colleagues Don’t Let Colleagues Make Bar Plot Meadows

Code
sheesleyData <- read.csv("sheesley.csv") %>%
  mutate(
    Weld = as.factor(Weld),
    Shift = as.factor(Shift),
    Machine = as.factor(Machine),
    Plant = as.factor(Plant),
    Replication = as.factor(Replication)
  )
sheesleyData$WeldShiftMachinePlant <- as.factor(paste(sheesleyData$Weld, sheesleyData$Shift, sheesleyData$Machine, sheesleyData$Plant, sep = "-"))
sheesleyData$ShiftPlant <- as.factor(paste(sheesleyData$Shift, sheesleyData$Plant, sep = "-"))

barmeadows <- sheesleyData %>%
  ggplot(aes(x = WeldShiftMachinePlant, y = Y)) +
  geom_bar(stat = "summary", fun = mean, aes(fill = Shift), width = 0.5, colour = "black") +
  stat_summary(
    geom = "errorbar", fun.data = "mean_se",
    width = 0.1, size = 1
  ) +
  labs(
    y = "Average number of welded lead wires missed per hour",
    x = "Weld-Shift-Machine-Plant"
  ) +
  scale_y_continuous(expand = expand_scale(mult = c(0, 0.1))) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)
  )

barmeadows

Figure 9: Bar plot for multifactorial data [7]

Colleagues Don’t Let Colleagues Make Bar Plot Meadows

Code
Machine.labs <- c("Machine 1", "Machine 2")
names(Machine.labs) <- c("1", "2")
Plant.labs <- c("Plant 1", "Plant 2")
names(Plant.labs) <- c("1", "2")

dotsformultifactor <- sheesleyData %>%
  ggplot(aes(x = Shift, y = Y)) +
  facet_wrap(Machine ~ Plant, scales = "free_x", strip.position = "bottom", labeller = labeller(Machine = Machine.labs, Plant = Plant.labs)) +
  geom_boxplot(aes(fill = Shift)) +
  bene_palette_shifted +
  labs(
    y = "Average number of welded lead wires missed per hour",
  ) +
  theme_classic() +
  theme(
    text = element_text(color = "black"),
    axis.text = element_text(color = "black"),
    strip.text.x = element_text(color = "black"),
    strip.background = element_blank(),
    strip.placement = "outside",
  )

dotsformultifactor

Figure 10: Multifactorial data as boxplot facet

Colleagues Don’t Let Colleagues MakePlots with Red/Green or Rainbow color scales

Code
abc_1 <- expand.grid(
  a = c(1:10),
  b = c(1:10)
) %>%
  mutate(c = a + b)

to_grey_scale <- function(hex) {
  temp <- col2rgb(hex) %>%
    t() %>%
    as.data.frame() %>%
    mutate(gs = 0.299 * red + 0.587 * green + 0.114 * blue)

  rgb(temp$gs, temp$gs, temp$gs, maxColorValue = 256)
}

grid_theme <- theme(
  text = element_text(color = "black"),
  # legend.position = "none",
  legend.title = element_blank(),
  legend.text = element_blank(),
  axis.title = element_blank(),
  axis.text = element_text(color = "black"),
)

# Red/green

Red_green <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradient2(low = "red", mid = "white", high = "green", midpoint = 11) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## Grey scale

Red_green_gs <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradient2(
    low = to_grey_scale("red"),
    mid = "white",
    high = to_grey_scale("green"), midpoint = 11
  ) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## Red/green colorblind

Red_green_deu <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradient2(
    low = clr_deutan("red"),
    mid = clr_deutan("white"),
    high = clr_deutan("green"), midpoint = 11
  ) +
  theme_classic() +
  grid_theme +
  coord_fixed()

# Rainbow

Rainbow1 <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = rainbow(20)) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## grey scale

Rainbow_gs <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = rainbow(20) %>% to_grey_scale()) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## Red/green color blind

Rainbow_deu <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = rainbow(20) %>% clr_deutan()) +
  theme_classic() +
  grid_theme +
  coord_fixed()

# Viridis

viridis_optionD <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = viridis(100)) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## Grey scale

viridis_gs <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = to_grey_scale(viridis(100))) +
  theme_classic() +
  grid_theme +
  coord_fixed()

## Red/green color blind

viridis_deu <- abc_1 %>%
  ggplot(aes(x = a, y = b)) +
  geom_tile(aes(fill = c)) +
  scale_fill_gradientn(colors = clr_deutan(viridis(100))) +
  theme_classic() +
  grid_theme +
  coord_fixed()

wrap_plots(Red_green, Red_green_gs, Red_green_deu,
  Rainbow1, Rainbow_gs, Rainbow_deu,
  viridis_optionD, viridis_gs, viridis_deu,
  nrow = 3,
  ncol = 3
)

Figure 11: Heatmaps with different color scales simulated in black/white and deuteranopia

Colleagues Don’t Let Colleagues MakeHeatmaps without Adjusting for Outliers

Code
set.seed(2363)

group1 <- rnorm(mean = 1, sd = 0.2, n = 8)
group2 <- rnorm(mean = 0, sd = 0.2, n = 12)
group3 <- rnorm(mean = 0, sd = 0.2, n = 8)
group4 <- rnorm(mean = 1, sd = 0.2, n = 10)
group5 <- rnorm(mean = 10, sd = 1, n = 2)

toydata <- data.frame(
  observation1 = c(group1, group2),
  observation2 = c(group3, group4, group5)
) %>%
  mutate(feature = 1:20) %>%
  pivot_longer(cols = !feature, names_to = "observation", values_to = "value") %>%
  mutate(observation2 = str_remove(observation, "observation"))

p1 <- toydata %>%
  mutate(observation2 = fct_rev(observation2)) %>%
  ggplot(aes(x = observation2, y = feature)) +
  geom_tile(aes(fill = value)) +
  scale_fill_gradientn(colors = viridis(n = 10)) +
  labs(
    x = "Observations",
    y = "Features"
  ) +
  theme_classic() +
  coord_flip()

p2 <- toydata %>%
  mutate(observation2 = fct_rev(observation2)) %>%
  mutate(rank = rank(value, ties.method = "first")) %>%
  ggplot(aes(x = value, y = rank)) +
  geom_point(
    shape = 21, color = "black",
    aes(fill = value), size = 3
  ) +
  scale_fill_gradientn(colors = viridis(n = 10)) +
  theme_classic() +
  theme(
    legend.position = "none",
  )

p3 <- toydata %>%
  mutate(observation2 = fct_rev(observation2)) %>%
  mutate(rank = rank(value, ties.method = "first")) %>%
  mutate(value2 = case_when(
    value >= 2 ~ 2,
    T ~ value
  )) %>%
  ggplot(aes(x = value, y = rank)) +
  geom_point(
    shape = 21, color = "black",
    aes(fill = value2), size = 3
  ) +
  scale_fill_gradientn(
    colors = viridis(n = 10),
    breaks = c(0, 1, 2),
    labels = c("0", "1", ">2")
  ) +
  labs(fill = "value") +
  theme_classic() +
  theme(
    legend.position = "none",
  )

p4 <- toydata %>%
  mutate(observation2 = fct_rev(observation2)) %>%
  mutate(value2 = case_when(
    value >= 1.5 ~ 2,
    T ~ value
  )) %>%
  ggplot(aes(x = observation2, y = feature)) +
  geom_tile(aes(fill = value2)) +
  scale_fill_gradientn(
    colors = viridis(n = 10),
    breaks = c(0, 1, 2),
    labels = c("0", "1", ">2")
  ) +
  labs(
    x = "Observations",
    y = "Features",
    fill = "value"
  ) +
  theme_classic() +
  coord_flip()

wrap_plots(p1, p2, p4, p3,
  nrow = 2, ncol = 2,
  widths = c(1, 0.7)
)

Figure 12: Heatmaps with outliers

Bonus: Bar Plot Alternatives

Code
# total = 31983
armsData <- read.csv("arms_supplier_2022.csv") %>%
  mutate(Supplier = as.factor(Supplier)) %>%
  mutate(Percentage = TIV_2022 / 31983)

lollipops <- armsData[1:20, ] %>%
  ggplot(aes(x = reorder(Supplier, Percentage), y = Percentage)) +
  geom_segment(aes(xend = Supplier, yend = 0), color = "darkgrey") +
  scale_y_continuous(labels = scales::percent, expand = expand_scale(mult = c(0, 0.1))) +
  geom_point(size = 4, color = bene_colors[1]) +
  bene_palette_shifted +
  theme_classic() +
  theme(
    panel.grid.major.x = element_line(color = "lightgrey"),
    panel.grid.minor.x = element_line(color = "lightgrey"),
    axis.title.y = element_blank()
  ) +
  labs(x = "Supplier", y = "Percentage of arms supplied in 2022") +
  coord_flip()

treemap <- armsData[1:20, ] %>%
  ggplot(aes(area = Percentage, fill = Supplier, label = Supplier)) +
  geom_treemap(start = "topleft") +
  geom_treemap_text(color = "white", place = "centre", reflow = TRUE, start = "topleft") +
  scale_fill_manual(values = rep(bene_colors, 20))

lollipops + treemap

Figure 13: Alternatives to bar plots: lollipop plot and treemap

R Packages Used

Package Version
base [8] 4.3.3
ggbeeswarm [9] 0.7.1
knitr [10][12] 1.42
patchwork [13] 1.1.2
prismatic [14] 1.1.1
renv [15] 0.17.3
rmarkdown [16][18] 2.21
scales [19] 1.2.1
styler [20] 1.9.1
tidyverse [21] 2.0.0
treemapify [22] 2.5.5
viridis [23] 0.6.2

References

[1]
C. Li, “Friends Don’t Let Friends Make Bad Graphs.” Zenodo, Jan. 16, 2023. doi: 10.5281/ZENODO.7542491.
[2]
Y. Holtz, “The issue with pie chart.” https://www.data-to-viz.com/caveat/pie.html (accessed Apr. 18, 2023).
[3]
H. Siirtola, “The Cost of Pie Charts,” in 2019 23rd International Conference Information Visualisation (IV), Jul. 2019, pp. 151–156. doi: 10.1109/IV.2019.00034.
[4]
S. Few, “Save the Pies for Dessert,” Aug. 2007. www.perceptualedge.com/articles/08-21-07.pdf (accessed Apr. 27, 2023).
[5]
C. Nussbaumer, “Death to pie charts,” Jul. 20, 2011. https://www.storytellingwithdata.com/blog/2011/07/death-to-pie-charts (accessed Apr. 27, 2023).
[6]
T. L. Weissgerber, N. M. Milic, S. J. Winham, and V. D. Garovic, “Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm,” PLOS Biology, vol. 13, no. 4, p. e1002128, Apr. 2015, doi: 10.1371/journal.pbio.1002128.
[7]
J. Sheesley, “Light Bulb Lead Wire Weld Process Comparison.” Accessed: May 03, 2023. [Online]. Available: https://www.itl.nist.gov/div898/software/dataplot/data/SHEESLEY.DAT
[8]
R Core Team, R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2024. Available: https://www.R-project.org/
[9]
E. Clarke, S. Sherrill-Mix, and C. Dawson, ggbeeswarm: Categorical scatter (violin point) plots. 2022. Available: https://CRAN.R-project.org/package=ggbeeswarm
[10]
Y. Xie, knitr: A comprehensive tool for reproducible research in R,” in Implementing reproducible computational research, V. Stodden, F. Leisch, and R. D. Peng, Eds. Chapman; Hall/CRC, 2014.
[11]
Y. Xie, Dynamic documents with R and knitr, 2nd ed. Boca Raton, Florida: Chapman; Hall/CRC, 2015. Available: https://yihui.org/knitr/
[12]
Y. Xie, knitr: A general-purpose package for dynamic report generation in r. 2023. Available: https://yihui.org/knitr/
[13]
T. L. Pedersen, patchwork: The composer of plots. 2022. Available: https://CRAN.R-project.org/package=patchwork
[14]
E. Hvitfeldt, prismatic: Color manipulation tools. 2022. Available: https://CRAN.R-project.org/package=prismatic
[15]
K. Ushey, renv: Project environments. 2023. Available: https://CRAN.R-project.org/package=renv
[16]
Y. Xie, J. J. Allaire, and G. Grolemund, R markdown: The definitive guide. Boca Raton, Florida: Chapman; Hall/CRC, 2018. Available: https://bookdown.org/yihui/rmarkdown
[17]
Y. Xie, C. Dervieux, and E. Riederer, R markdown cookbook. Boca Raton, Florida: Chapman; Hall/CRC, 2020. Available: https://bookdown.org/yihui/rmarkdown-cookbook
[18]
J. Allaire et al., rmarkdown: Dynamic documents for r. 2023. Available: https://github.com/rstudio/rmarkdown
[19]
H. Wickham and D. Seidel, scales: Scale functions for visualization. 2022. Available: https://CRAN.R-project.org/package=scales
[20]
K. Müller and L. Walthert, styler: Non-invasive pretty printing of r code. 2023. Available: https://CRAN.R-project.org/package=styler
[21]
H. Wickham et al., “Welcome to the tidyverse,” Journal of Open Source Software, vol. 4, no. 43, p. 1686, 2019, doi: 10.21105/joss.01686.
[22]
D. Wilkins, treemapify: Draw treemaps in ggplot2. 2021. Available: https://CRAN.R-project.org/package=treemapify
[23]
Garnier et al., viridis - colorblind-friendly color maps for r. 2021. doi: 10.5281/zenodo.4679424.