R
workflow integrating models in computer
vision and ecological statisticsDeep learning is used in computer vision problems with important applications in several scientific fields. In ecology for example, there is a growing interest in deep learning for automatizing repetitive analyses on large amounts of images, such as animal species identification.
However, there are challenging issues toward the wide adoption of
deep learning by the community of ecologists. First, there is a
programming barrier as most algorithms are written in
Python
while most ecologists are versed in R
.
Second, recent applications of deep learning in ecology have focused on
computational aspects and simple tasks without addressing the underlying
ecological questions or carrying out the statistical data analysis to
answer these questions.
Here, we showcase a reproducible R
workflow integrating
both deep learning and statistical models using predator-prey
relationships as a case study. We illustrate deep learning for the
identification of animal species on images collected with camera traps,
and quantify spatial co-occurrence using multispecies occupancy
models.
Despite average model classification performances, ecological inference was similar whether we analysed the ground truth dataset or the classified dataset. This result calls for further work on the trade-offs between time and resources allocated to train models with deep learning and our ability to properly address key ecological questions with biodiversity monitoring. We hope that our reproducible workflow will be useful to ecologists and applied statisticians.
All material (source of the Rmarkdown notebook and auxiliary files) is available from https://github.com/oliviergimenez/computo-deeplearning-occupany-lynx.
Computer vision is a field of artificial intelligence in which a machine is taught how to extract and interpret the content of an image (Krizhevsky, Sutskever, and Hinton 2012). Computer vision relies on deep learning that allows computational models to learn from training data – a set of manually labelled images – and make predictions on new data – a set of unlabelled images (Baraniuk, Donoho, and Gavish 2020; LeCun, Bengio, and Hinton 2015). With the growing availability of massive data, computer vision with deep learning is being increasingly used to perform tasks such as object detection, face recognition, action and activity recognition or human pose estimation in fields as diverse as medicine, robotics, transportation, genomics, sports and agriculture (Voulodimos et al. 2018).
In ecology in particular, there is a growing interest in deep learning for automatizing repetitive analyses on large amounts of images, such as identifying plant and animal species, distinguishing individuals of the same or different species, counting individuals or detecting relevant features (Christin, Hervet, and Lecomte 2019; Lamba et al. 2019; Weinstein 2018). By saving hours of manual data analyses and tapping into massive amounts of data that keep accumulating with technological advances, deep learning has the potential to become an essential tool for ecologists and applied statisticians.
Despite the promising future of computer vision and deep learning,
there are challenging issues toward their wide adoption by the community
of ecologists (e.g. Wearn, Freeman,
and Jacoby 2019). First, there is a programming barrier as
most algorithms are written in the Python
language (but see
MXNet
in R and the R
interface to Keras) while most ecologists are versed in
R
(Lai et al.
2019). If ecologists are to use computer vision in routine,
there is a need for bridges between these two languages (through, e.g.,
the reticulate
package Allaire et
al. (2017)
or the shiny
package Tabak et al.
(2020)). Second, ecologists may be
reluctant to develop deep learning algorithms that require large amounts
of computation time and consequently come with an environmental cost due
to carbon emissions (Strubell, Ganesh,
and McCallum 2019). Third, recent applications of computer
vision via deep learning in ecology have focused on computational
aspects and simple tasks without addressing the underlying ecological
questions (Sutherland et al. 2013), or carrying out
statistical data analysis to answer these questions (Gimenez et al. 2014). Although perfectly
understandable given the challenges at hand, we argue that a better
integration of the why (ecological questions), the
what (automatically labelled images) and the how
(statistics) would be beneficial to computer vision for ecology (see also Weinstein 2018).
Here, we showcase a full why-what-how workflow in R
using a case study on the structure of an ecological community (a set of
co-occurring species) composed of the Eurasian lynx (Lynx lynx)
and its two main preys. First, we introduce the case study and motivate
the need for deep learning. Second we illustrate deep learning for the
identification of animal species in large amounts of images, including
model training and validation with a dataset of labelled images, and
prediction with a new dataset of unlabelled images. Last, we proceed
with the quantification of spatial co-occurrence using statistical
models.
Lynx (Lynx lynx) went extinct in France at the end of the 19th century due to habitat degradation, human persecution and decrease in prey availability (Vandel and Stahl 2005). The species was reintroduced in Switzerland in the 1970s (Breitenmoser 1998), then re-colonised France through the Jura mountains in the 1980s (Vandel and Stahl 2005). The species is listed as endangered under the 2017 IUCN Red list and is of conservation concern in France due to habitat fragmentation, poaching and collisions with vehicles. The Jura holds the bulk of the French lynx population.
To better understand its distribution, we need to quantify its interactions with its main preys, roe deer (Capreolus capreolus) and chamois (Rupicapra rupicapra) (Molinari-Jobin et al. 2007), two ungulate species that are also hunted. To assess the relative contribution of predation and hunting to the community structure and dynamics, a predator-prey program was set up jointly by the French Office for Biodiversity, the Federations of Hunters from the Jura, Ain and Haute-Savoie counties and the French National Centre for Scientific Research. Animal detections were made using a set of camera traps in the Jura mountains that were deployed in the Jura and Ain counties (see Figure 1). Altitude in the Jura site ranges from 520m to 1150m, and from 400m to 950m for the Ain site. Woodland areas cover 69% of the Ain site, with deciduous forests (63%) followed by coniferous (19.5%) and mixed forest (12.5%). In the Jura site, woodland areas cover 62% of the area, with mixed forests (46.6%), deciduous forests (37.3%) and coniferous (14%). In both sites, the remaining habitat is meadows used by cattle.
We divided the two study areas into grids of 2.7 \(\times\) 2.7 km cells or sites hereafter (Zimmermann et al. 2013) in which we set two camera traps per site (Xenon white flash with passive infrared trigger mechanisms, model Capture, Ambush and Attack; Cuddeback), with 18 sites in the Jura study area, and 11 in the Ain study area that were active over the study period (from February 2016 to October 2017 for the Jura county, and from February 2017 to May 2019 for the Ain county). The location of camera traps was chosen to maximise lynx detection. More precisely, camera traps were set up along large paths in the forest, on each side of the path at 50cm high. Camera traps were checked weekly to change memory cards, batteries and to remove fresh snow after heavy snowfall.
In total, 45563 and 18044 pictures were considered in the Jura and
Ain sites respectively after manually droping empty pictures and
pictures with unidentified species. Note that classifying empty images
could be automatised with deep learning (Norouzzadeh et
al. 2021; Tabak et al. 2020). We identified the
species present on all images by hand (see Table 1) using
digiKam
a free open-source digital photo management
application (https://www.digikam.org/). This operation took several
weeks of labor full time, which is often identified as a limitation of
camera trap studies. To expedite this tedious task, computer vision with
deep learning has been identified as a promising approach (Norouzzadeh et al. 2021; Tabak et al.
2019; Willi et al. 2019).
Species in Jura study site | n | Species in Ain study site | n |
---|---|---|---|
human | 31644 | human | 4946 |
vehicule | 5637 | vehicule | 4454 |
dog | 2779 | dog | 2310 |
fox | 2088 | fox | 1587 |
chamois | 919 | rider | 1025 |
wild boar | 522 | roe deer | 860 |
badger | 401 | chamois | 780 |
roe deer | 368 | hunter | 593 |
cat | 343 | wild boar | 514 |
lynx | 302 | badger | 461 |
Using the images we obtained with camera traps (Table 1), we trained a model for identifying species using the Jura study site as a calibration dataset. We then assessed this model’s ability to automatically identify species on a new dataset, also known as transferability, using the Ain study site as an evaluation dataset. Even though in the present work we quantified co-occurrence between lynx and its prey, we included other species in the training to investigate the structure and dynamics of the entire community in future work. Also, the use of specific species categories instead of just a “other” category besides the focal species should help the algorithm to determine with better confidence when a picture does not contain a focal species in situations where there is no doubt that this is another species (think of a vehicle for example), or where a species is detected with which a focal species can be confused, e.g. lynx with fox.
We selected at random 80% of the annotated images for each species in
the Jura study site for training, and 20% for testing. We applied
various transformations (flipping, brightness and contrast modifications
following Shorten and Khoshgoftaar (2019))
to improve training (see Appendix). To reduce model training time and
overcome the small number of images, we used transfer learning (Yosinski et al. 2014; Shao, Zhu, and Li
2015) and considered a pre-trained model as a starting point.
Specifically, we trained a deep convolutional neural network (ResNet-50)
architecture (He et al. 2016) using the
fastai
library (https://docs.fast.ai/) that implements the
PyTorch
library (Paszke et al.
2019). Interestingly, the fastai
library comes
with an R
interface (https://eagerai.github.io/fastai/) that uses the
reticulate
package to communicate with Python
,
therefore allowing R
users to access up-to-date deep
learning tools. We trained models on the Montpellier Bioinformatics
Biodiversity platform using a GPU machine (Titan Xp nvidia) with 16Go of
RAM. We used 20 epochs which took approximately 10 hours. The
computational burden prevented us from providing a full reproducible
analysis, but we do so with a subsample of the dataset in the Appendix.
All trained models are available from https://doi.org/10.5281/zenodo.5164796.
Using the testing dataset, we calculated three metrics to evaluate our model performance at correctly identifying species (e.g. Duggan et al. 2021). Specifically, we relied on accuracy the ratio of correct predictions to the total number of predictions, recall a measure of false negatives (FN; e.g. an image with a lynx for which our model predicts another species) with recall = TP / (TP + FN) where TP is for true positives, and precision a measure of false positives (FP; e.g. an image with any species but a lynx for which our model predicts a lynx) with precision = TP / (TP + FP). In camera trap studies, a strategy (Duggan et al. 2021) consists in optimizing precision if the focus is on rare species (lynx), while recall should be optimized if the focus is on commom species (chamois and roe deer).
We achieved 85% accuracy during training. Our model had good performances for the three classes we were interested in, with 87% precision for lynx and 81% recall for both roe deer and chamois (Table 2).
species | precision | recall |
---|---|---|
badger | 0.78 | 0.88 |
red deer | 0.67 | 0.21 |
chamois | 0.86 | 0.81 |
cat | 0.89 | 0.78 |
roe deer | 0.67 | 0.81 |
dog | 0.78 | 0.84 |
human | 0.99 | 0.79 |
hare | 0.32 | 0.52 |
lynx | 0.87 | 0.95 |
fox | 0.85 | 0.90 |
wild boar | 0.93 | 0.88 |
vehicule | 0.95 | 0.98 |
We evaluated transferability for our trained model by predicting species on images from the Ain study site which were not used for training. Precision was 77% for lynx, and while we achieved 86% recall for roe deer, our model performed poorly for chamois with 8% recall (Table 3).
precision | recall | |
---|---|---|
badger | 0.71 | 0.89 |
rider | 0.79 | 0.92 |
red deer | 0.00 | 0.00 |
chamois | 0.82 | 0.08 |
hunter | 0.17 | 0.11 |
cat | 0.46 | 0.59 |
roe deer | 0.67 | 0.86 |
dog | 0.77 | 0.35 |
human | 0.51 | 0.93 |
hare | 0.37 | 0.35 |
lynx | 0.77 | 0.89 |
marten | 0.05 | 0.04 |
fox | 0.90 | 0.53 |
wild boar | 0.75 | 0.94 |
cow | 0.01 | 0.25 |
vehicule | 0.94 | 0.51 |
To better understand this pattern, we display the results under the form of a confusion matrix that compares model classifications to manual classifications (Figure 2). There were a lot of false negatives for chamois, meaning that when a chamois was present in an image, it was often classified as another species by our model.
Overall, our model trained on images from the Jura study site did poorly at correctly predicting species on images from the Ain study site. This result does not come as a surprise, as generalizing classification algorithms to new environments is known to be difficult (Beery, Horn, and Perona 2018). While a computer scientist might be disappointed in these results, an ecologist would probably wonder whether ecological inference about the co-occurrence between lynx and its prey is biased by these average performances, a question we address in the next section.
Here, we analysed the data we acquired from the previous section. For the sake of comparison, we considered two datasets, one made of the images manually labelled for both the Jura and Ain study sites pooled together (ground truth dataset), and the other in which we pooled the images that were manually labelled for the Jura study site and the images that were automatically labelled for the Ain study site using our trained model (classified dataset).
We formatted the data by generating monthly detection histories, that is a sequence of detections (\(Y_{sit} = 1\)) and non-detections (\(Y_{sit} = 0\)), for species \(s\) at site \(i\) and sampling occasion \(t\) (see Figure 3).
To quantify spatial co-occurrence betwen lynx and its preys, we used
a multispecies occupancy modeling approach (Rota et al. 2016)
implemented in the R
package unmarked
(Fiske and Chandler 2011) within the
maximum likelihood framework. The multispecies occupancy model assumes
that observations \(y_{sit}\),
conditional on \(Z_{si}\) the latent
occupancy state of species \(s\) at
site \(i\) are drawn from Bernoulli
random variables \(Y_{sit} | Z_{si} \sim
\mbox{Bernoulli}(Z_{si}p_{sit})\) where \(p_{sit}\) is the detection probability of
species \(s\) at site \(i\) and sampling occasion \(t\). Detection probabilities can be modeled
as a function of site and/or sampling covariates, or the
presence/absence of other species, but for the sake of illustration, we
will make them only species-specific here.
The latent occupancy states are assumed to be distributed as multivariate Bernoulli random variables (Dai, Ding, and Wahba 2013). Let us consider 2 species, species 1 and 2, then \(Z_i = (Z_{i1}, Z_{i2}) \sim \mbox{multivariate Bernoulli}(\psi_{11}, \psi_{10}, \psi_{01}, \psi_{00})\) where \(\psi_{11}\) is the probability that a site is occupied by both species 1 and 2, \(\psi_{10}\) the probability that a site is occupied by species 1 but not 2, \(\psi_{01}\) the probability that a site is occupied by species 2 but not 1, and \(\psi_{00}\) the probability a site is occupied by none of them. Note that we considered species-specific only occupancy probabilities but these could be modeled as site-specific covariates. Marginal occupancy probabilities are obtained as \(\Pr(Z_{i1}=1) = \psi_{11} + \psi_{10}\) and \(\Pr(Z_{i2}=1) = \psi_{11} + \psi_{01}\). With this model, we may also infer co-occurrence by calculating conditional probabilities such as for example the probability of a site being occupied by species 2 conditional of species 1 with \(\Pr(Z_{i2} = 1| Z_{i1} = 1) = \displaystyle{\frac{\psi_{11}}{\psi_{11}+\psi_{10}}}\).
Despite its appeal and increasing use in ecology, multispecies occupancy models can be difficult to fit to real-world data in practice. First, these models are data-hungry and regularization methods (Clipp et al. 2021) are needed to avoid occupancy probabilities to be estimated at the boundary of the parameter space or with large uncertainty. Second, and this is true for any joint species distribution models, these models quickly become very complex with many parameters to be estimated when the number of species increases and co-occurrence is allowed between all species. Here, ecological expertise should be used to consider only meaningful species interactions and apply parsimony when parameterizing models.
We now turn to the results obtained from a model with five species namely lynx, chamois, roe deer, fox and cat and co-occurrence allowed between lynx and chamois and roe deer only.
Detection probabilities were indistinguishable (at the third decimal) whether we used the ground truth or the classified dataset, with \(p_{\mbox{lynx}} = 0.51 (0.45, 0.58)\), \(p_{\mbox{roe deer}} = 0.63 (0.57, 0.68)\) and \(p_{\mbox{chamois}} = 0.61 (0.55, 0.67)\).
We also found that occupancy probability estimates were similar whether we used the ground truth or the classified dataset (Figure 4). Roe deer was the most prevalent species, but lynx and chamois were also occurring with high probability (Figure 4). Note that, despite chamois being often misclassified (Figure 2), its marginal occupancy tends to be higher when estimated with the classified dataset. Ecologically speaking, this might well be the case if the correctly classified detections are spread over all camera traps. The difference in marginal occupancy seems however non-significant judging by the overlap between the two confidence intervals.
Because marginal occupancy probabilities were high, probabilities of co-occurrence were also estimated high (Figure 5). Our results should be interpreted bearing in mind that co-occurrence is a necessary but not sufficient condition for actual interaction. When both preys were present, lynx was more present than when they were both absent (Figure 5). Lynx was more sensitive to the presence of roe deer than that of chamois (Figure 5).