class: center, middle, inverse, title-slide # Introduction to the Tidyverse ## Data visualisation ### Olivier Gimenez ### last updated: 2022-05-22 --- # Visualization with ggplot2 * The package ggplot2 implements a **g**rammar of **g**raphics * Operates on data.frames or tibbles, not vectors like base R * Explicitly differentiates between the data and its representation <img src="assets/img/ggplot2_logo.jpg" width="30%" style="display: block; margin: auto;" /> --- # The ggplot2 grammar Grammar element | What it is :---------------- | :----------------------------- **Data** | The data frame being plotted **Geometrics** | The geometric shape that will represent the data | (e.g., point, boxplot, histogram) **Aesthetics** | The aesthetics of the geometric object | (e.g., color, size, shape) <img src="assets/img/ggplot2_logo.jpg" width="30%" style="display: block; margin: auto;" /> --- # Data See Introduction to data wrangling with `dplyr` at https://bit.ly/3wF5S7f ```r citations <- read_csv('https://raw.githubusercontent.com/oliviergimenez/intro_tidyverse/master/journal.pone.0166570.s001.CSV') %>% rename(journal = 'Journal identity', impactfactor = '5-year journal impact factor', pubyear = 'Year published', colldate = 'Collection date', pubdate = 'Publication date', nbtweets = 'Number of tweets', woscitations = 'Number of Web of Science citations') %>% mutate(journal = as.factor(journal)) ``` ``` ## Rows: 1599 Columns: 12 ## ── Column specification ──────────────────────────────────────────────────────── ## Delimiter: "," ## chr (5): Journal identity, Issue, Authors, Collection date, Publication date ## dbl (7): 5-year journal impact factor, Year published, Volume, Number of twe... ## ## ℹ Use `spec()` to retrieve the full column specification for this data. ## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. ``` --- # Scatterplots ```r *citations %>% * ggplot() + aes(x = nbtweets, y = woscitations) + geom_point() ``` * Pass in the data frame as your first argument --- # Scatterplots ```r citations %>% ggplot() + * aes(x = nbtweets, y = woscitations) + geom_point() ``` * Pass in the data frame as your first argument * Aesthetics maps the data onto plot characteristics, here x and y axes --- # Scatterplots ```r citations %>% ggplot() + aes(x = nbtweets, y = woscitations) + * geom_point() ``` * Pass in the data frame as your first argument * Aesthetics maps the data onto plot characteristics, here x and y axes * Display the data geometrically as points --- # Scatterplots ```r citations %>% ggplot() + aes(x = nbtweets, y = woscitations) + geom_point() ``` <img src="assets/chunks/unnamed-chunk-7-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Scatterplots, with colors ```r citations %>% ggplot() + aes(x = nbtweets, y = woscitations) + * geom_point(color = "red") ``` <img src="assets/chunks/unnamed-chunk-8-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Scatterplots, with species-specific colors ```r citations %>% ggplot() + * aes(x = nbtweets, y = woscitations, color = journal) + geom_point() ``` <img src="assets/chunks/unnamed-chunk-9-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> * Placing color inside aesthetic maps it to the data --- # Pick a few journals ```r citations_ecology <- citations %>% mutate(journal = str_to_lower(journal)) %>% # all journals names lowercase filter(journal %in% c('journal of animal ecology','journal of applied ecology','ecology')) # filter citations_ecology ``` ``` ## # A tibble: 216 × 12 ## journal impactfactor pubyear Volume Issue Authors colldate pubdate nbtweets ## <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <dbl> ## 1 ecology 6.16 2014 95 12 Magliane… 3/19/20… 12/1/2… 1 ## 2 ecology 6.16 2014 95 12 Soinen 3/19/20… 12/1/2… 6 ## 3 ecology 6.16 2014 95 12 Graham a… 3/19/20… 12/1/2… 1 ## 4 ecology 6.16 2014 95 11 White et… 3/19/20… 11/1/2… 9 ## 5 ecology 6.16 2014 95 11 Einarson… 3/19/20… 11/1/2… 15 ## 6 ecology 6.16 2014 95 11 Haav and… 3/19/20… 11/1/2… 2 ## 7 ecology 6.16 2014 95 10 Dodds et… 3/19/20… 10/1/2… 1 ## 8 ecology 6.16 2014 95 10 Brown et… 3/19/20… 10/1/2… 1 ## 9 ecology 6.16 2014 95 10 Wright e… 3/19/20… 10/1/2… 0 ## 10 ecology 6.16 2014 95 9 Ramahlo … 3/19/20… 9/1/20… 27 ## # … with 206 more rows, and 3 more variables: `Number of users` <dbl>, ## # `Twitter reach` <dbl>, woscitations <dbl> ``` --- # Scatterplots, with species-specific shapes ```r citations_ecology %>% ggplot() + * aes(x = nbtweets, y = woscitations, shape = journal) + geom_point(size=2) ``` <img src="assets/chunks/unnamed-chunk-11-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Scatterplots, lines instead of points ```r citations_ecology %>% ggplot() + aes(x = nbtweets, y = woscitations) + * geom_line() + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-12-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Scatterplots, lines with sorting beforehand ```r citations_ecology %>% * arrange(woscitations) %>% ggplot() + aes(x = nbtweets, y = woscitations) + geom_line() + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-13-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Scatterplots, add points ```r citations_ecology %>% ggplot() + aes(x = nbtweets, y = woscitations) + geom_line() + * geom_point() + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-14-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Scatterplots, add linear trend ```r citations_ecology %>% ggplot() + aes(x = nbtweets, y = woscitations) + geom_point() + * geom_smooth(method = "lm") + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-15-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Scatterplots, add smoother ```r citations_ecology %>% ggplot() + aes(x = nbtweets, y = woscitations) + geom_point() + * geom_smooth() + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-16-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # aes or not aes? * If we are to establish a link between the values of a variable and a graphical feature, ie a mapping, then we need an aes(). * Otherwise, the graphical feature is modified irrespective of the data, then we do not need an aes(). <img src="assets/img/ggplot2_logo.jpg" width="30%" style="display: block; margin: auto;" /> --- # Histograms ```r citations_ecology %>% ggplot() + aes(x = nbtweets) + * geom_histogram() ``` <img src="assets/chunks/unnamed-chunk-18-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Histograms, with colors ```r citations_ecology %>% ggplot() + aes(x = nbtweets) + * geom_histogram(fill = "orange") ``` <img src="assets/chunks/unnamed-chunk-19-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Histograms, with colors ```r citations_ecology %>% ggplot() + aes(x = nbtweets) + * geom_histogram(fill = "orange", color = "brown") ``` <img src="assets/chunks/unnamed-chunk-20-1.png" width="400cm" height="400cm" style="display: block; margin: auto;" /> --- # Histograms, with labels and title ```r citations_ecology %>% ggplot() + aes(x = nbtweets) + geom_histogram(fill = "orange", color = "brown") + * labs(x = "Number of tweets", * y = "Count", * title = "Histogram of the number of tweets") ``` <img src="assets/chunks/unnamed-chunk-21-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Histograms, by species ```r citations_ecology %>% ggplot() + aes(x = nbtweets) + geom_histogram(fill = "orange", color = "brown") + labs(x = "Number of tweets", y = "Count", title = "Histogram of the number of tweets") + * facet_wrap(vars(journal)) ``` <img src="assets/chunks/unnamed-chunk-22-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Boxplots ```r citations_ecology %>% ggplot() + aes(x = "", y = nbtweets) + * geom_boxplot() + scale_y_log10() ``` <img src="assets/chunks/unnamed-chunk-23-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Boxplots with colors ```r citations_ecology %>% ggplot() + aes(x = "", y = nbtweets) + * geom_boxplot(fill = "green") + scale_y_log10() ``` <img src="assets/chunks/unnamed-chunk-24-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Boxplots with colors by species ```r citations_ecology %>% ggplot() + * aes(x = journal, y = nbtweets, fill = journal) + geom_boxplot() + scale_y_log10() ``` <img src="assets/chunks/unnamed-chunk-25-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Get rid of the ticks on x axis ```r citations_ecology %>% ggplot() + aes(x = journal, y = nbtweets, fill = journal) + geom_boxplot() + scale_y_log10() + * theme(axis.text.x = element_blank()) + * labs(x = "") ``` <img src="assets/chunks/unnamed-chunk-26-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Boxplots, user-specified colors by species ```r citations_ecology %>% ggplot() + aes(x = journal, y = nbtweets, fill = journal) + geom_boxplot() + scale_y_log10() + * scale_fill_manual( * values = c("red", "blue", "purple")) + theme(axis.text.x = element_blank()) + labs(x = "") ``` <img src="assets/chunks/unnamed-chunk-27-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Boxplots, change legend settings ```r citations_ecology %>% ggplot() + aes(x = journal, y = nbtweets, fill = journal) + geom_boxplot() + scale_y_log10() + * scale_fill_manual( values = c("red", "blue", "purple"), * name = "Journal name", * labels = c("Ecology", "J Animal Ecology", "J Applied Ecology")) + theme(axis.text.x = element_blank()) + labs(x = "") ``` <img src="assets/chunks/unnamed-chunk-28-1.png" width="270cm" height="270cm" style="display: block; margin: auto;" /> --- # Ugly bar plots ```r citations %>% count(journal) %>% ggplot() + aes(x = journal, y = n) + * geom_col() ``` <img src="assets/chunks/unnamed-chunk-29-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Idem, with flipping ```r citations %>% count(journal) %>% ggplot() + * aes(x = n, y = journal) + geom_col() ``` <img src="assets/chunks/unnamed-chunk-30-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Idem, with factors reordering and flipping ```r citations %>% count(journal) %>% ggplot() + * aes(x = n, y = fct_reorder(journal, n)) + geom_col() ``` <img src="assets/chunks/unnamed-chunk-31-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Further cleaning ```r citations %>% count(journal) %>% ggplot() + aes(x = n, y = fct_reorder(journal, n)) + geom_col() + labs(x = "counts", y = "") ``` <img src="assets/chunks/unnamed-chunk-32-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # More about how to (tidy) work with factors * [Be the boss of your factors](https://stat545.com/block029_factors.html) and * [forcats, forcats, vous avez dit forcats ?](https://thinkr.fr/forcats-forcats-vous-avez-dit-forcats/). --- # Density plots ```r citations_ecology %>% ggplot() + aes(x = nbtweets, fill = journal) + * geom_density() + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-33-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Density plots, control transparency ```r citations_ecology %>% ggplot() + aes(x = nbtweets, fill = journal) + * geom_density(alpha = 0.5) + scale_x_log10() ``` <img src="assets/chunks/unnamed-chunk-34-1.png" width="350cm" height="350cm" style="display: block; margin: auto;" /> --- # Change default background ```r # `B & W theme` citations_ecology %>% ggplot() + aes(x = nbtweets, fill = journal) + geom_density(alpha = 0.5) + scale_x_log10() + * theme_bw() ``` <img src="assets/chunks/unnamed-chunk-35-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Change default background theme ```r # `classic theme` citations_ecology %>% ggplot() + aes(x = nbtweets, fill = journal) + geom_density(alpha = 0.5) + scale_x_log10() + * theme_classic() ``` <img src="assets/chunks/unnamed-chunk-36-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # Change default background theme ```r # `dark theme` citations_ecology %>% ggplot() + aes(x = nbtweets, fill = journal) + geom_density(alpha = 0.5) + scale_x_log10() + * theme_dark() ``` <img src="assets/chunks/unnamed-chunk-37-1.png" width="300cm" height="300cm" style="display: block; margin: auto;" /> --- # More on data visualisation with ggplot2 * [Portfolio](https://www.r-graph-gallery.com/portfolio/ggplot2-package/) of ggplot2 plots * [Cédric Scherer's portfolio](https://cedricscherer.netlify.app/top/dataviz/) of data visualisations * [Top](http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html) ggplot2 visualizations * [Interactive](https://dreamrs.github.io/esquisse/) ggplot2 visualizations * [Manipulate and visualise spatial data](https://oliviergimenez.github.io/intro_spatialR/#1) with the sf package <img src="assets/img/ggplot2_logo.jpg" width="30%" style="display: block; margin: auto;" /> --- background-image: url(https://github.com/rstudio/hex-stickers/raw/master/SVG/tidyverse.svg?sanitize=true) background-size: 550px background-position: 50% 50% --- # To dive deeper in data visualisation with the tidyverse * [Learn the tidyverse](https://www.tidyverse.org/learn/): books, workshops and online courses * My selection of books: - [R for Data Science](https://r4ds.had.co.nz/) et [Advanced R](http://adv-r.had.co.nz/) - [Introduction à R et au tidyverse](https://juba.github.io/tidyverse/) - [Fundamentals of Data visualization](https://clauswilke.com/dataviz/) - [Data Visualization: A practical introduction](http://socviz.co/) * [Tidy Tuesdays videos](https://www.youtube.com/user/safe4democracy/videos) by D. Robinson * Material of the [2-day workshop Data Science in the tidyverse](https://github.com/cwickham/data-science-in-tidyverse) held at the RStudio 2019 conference * Material of the stat545 course on [Data wrangling, exploration, and analysis with R](https://stat545.com/) at the University of British Columbia --- # The [RStudio Cheat Sheets](https://www.rstudio.com/resources/cheatsheets/) <img src="assets/img/ggplot1.png" width="600px" style="display: block; margin: auto;" /> --- # The [RStudio Cheat Sheets](https://www.rstudio.com/resources/cheatsheets/) <img src="assets/img/ggplot2.png" width="600px" style="display: block; margin: auto;" />