--- title: "Data exploration using the palmerpenguins dataset" author: "Olivier Gimenez" date: "10/25/2020" output: word_document: default pdf_document: default html_document: default bibliography: mabiblio.bib --- ```{r setup, include = FALSE} knitr::opts_chunk$set(echo = TRUE, message = FALSE) ``` # Data exploration ## Motivation In this section, we **explore** the data from package `palmerpenguins`. A recent publication from the researcher, Dr Kristen Gorman, who shared the data is @Connors2020. ```{r echo = FALSE} library(palmerpenguins) library(tidyverse) ``` ## Data The data are displayed below (first 10 rows) : ```{r} penguins %>% slice(1:10) %>% knitr::kable() ``` ## Numerical exploration There are `r nrow(penguins)` penguins in the dataset, and `r length(unique(penguins$species))` different species. The data were collected in `r length(unique(penguins$island))` islands of the Palmer archipelago in Antarctica. The mean of all traits that were measured on the penguins are: ```{r echo = FALSE} penguins %>% group_by(species) %>% summarize(across(where(is.numeric), mean, na.rm = TRUE)) ``` ## Graphical exploration A histogram of body mass per species: ```{r} penguins %>% ggplot() + aes(x = body_mass_g) + geom_histogram(aes(fill = species), alpha = 0.5, position = "identity") + scale_fill_manual(values = c("darkorange","purple","cyan4")) + theme_minimal() + labs(x = "Body mass (g)", y = "Frequency", title = "Penguin body mass") ``` ## The end The 3 species of penguins: ![](lter_penguins.png) ## References