1 Introduction

To determine the questions and methods folks have been interested in, we searched for capture-recapture papers in the Web of Science. We found more than 5000 relevant papers during the 2009-2019 period.

To make sense of this big corpus, we carried out bibliometric and textual analyses in the spirit of Nakagawa et al. 2018. Explanations along with the code and results are in the next section Quantitative analyses: Bibliometric and textual analyses. We also inspected a sample of methodological and ecological papers, see section Qualitative analyses: Making sense of the corpus of scientific papers on capture-recapture.

2 Quantitative analyses: Bibliometric and textual analyses

2.1 Methods and data collection

To carry out a bibliometric analysis of the capture-recapture literature over the 2009-2019, we used the R package bibliometrix. We also carried out a text analysis using topic modelling, for which we recommend the book Text Mining with R.

To collect the data, we used the following settings:

  • Data source: Clarivate Analytics Web of Science (http://apps.webofknowledge.com)
  • Data format: Plain text
  • Query: capture-recapture OR mark-recapture OR capture-mark-recapture in Topic (search in title, abstract, author, keywords, and more)
  • Timespan: 2009-2019
  • Document Type: Articles
  • Query data: 5 August, 2019

We load the packages we need:

library(bibliometrix) # bib analyses
library(quanteda) # textual data analyses
library(tidyverse) # manipulation and viz data
library(tidytext) # handle text
library(topicmodels) # topic modelling

Let us read in and format the data:

# Loading txt or bib files into R environment
D <- c("data/savedrecs.txt",
       "data/savedrecs(1).txt",
       "data/savedrecs(2).txt",
       "data/savedrecs(3).txt",
       "data/savedrecs(4).txt",
       "data/savedrecs(5).txt",
       "data/savedrecs(6).txt",
       "data/savedrecs(7).txt",
       "data/savedrecs(8).txt",
       "data/savedrecs(9).txt",
       "data/savedrecs(10).txt")
# Converting the loaded files into a R bibliographic dataframe
# (takes a minute or two)
M <- convert2df(D, dbsource="wos", format="plaintext")
## 
## Converting your wos collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!

We ended up with 5022 articles. Note that WoS only allows 500 items to be exported at once, therefore we had to repeat the same operation multiple times.

We export back as a csv file for further inspection:

M %>% 
  mutate(title = tolower(TI), 
         abstract = tolower(AB),
         authors = AU,
         journal = SO,
         keywords = tolower(DE)) %>%
  select(title, keywords, journal, authors, abstract) %>%
  write_csv("data/crdat.csv")

2.2 Descriptive statistics

WoS provides the user with a bunch of graphs, let’s have a look.

Research areas are: areas

The number of publications per year is: years

The countries of the first author are: countries

The journals are: journals

The most productive authors are: authors

The graphs for the dataset of citing articles (who uses and what capture-recapture are used for) show the same patterns as the dataset of published articles, except for the journals. There are a few different journals from which a bunch of citations are coming from, namely Biological Conservation, Scientific Reports, Molecular Ecology and Proceedings of the Royal Society B - Biological Sciences: citingjournals

We also want to produce our own descriptive statistics. Let’s have a look to the data with R.

Number of papers per journal:

dat <- as_tibble(M)
dat %>%
  group_by(SO) %>%
  count() %>%
  filter(n > 50) %>%
  ggplot(aes(n, reorder(SO, n))) +
  geom_col() +
  labs(title = "Nb of papers per journal", x = "", y = "")

Most common words in titles:

wordft <- dat %>%
  mutate(line = row_number()) %>%
  filter(nchar(TI) > 0) %>%
  unnest_tokens(word, TI) %>%
  anti_join(stop_words) 

wordft %>%
  count(word, sort = TRUE)
wordft %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(title = "Most common words in titles", x = "", y = "")

Most common words in abstracts:

wordab <- dat %>%
  mutate(line = row_number()) %>%
  filter(nchar(AB) > 0) %>%
  unnest_tokens(word, AB) %>%
  anti_join(stop_words) 

wordab %>%
  count(word, sort = TRUE)
wordab %>%
  count(word, sort = TRUE) %>%
  filter(n > 1500) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(n, word)) +
  geom_col() +
  labs(title = "Most common words in abstracts", x = "", y = "")

2.3 Bibliometric results

Now we turn to a more detailed analysis of the published articles.

First calculate the main bibliometric measures:

results <- biblioAnalysis(M, sep = ";")
options(width=100)
S <- summary(object = results, k = 10, pause = FALSE)
## 
## 
## MAIN INFORMATION ABOUT DATA
## 
##  Timespan                              2009 : 2019 
##  Sources (Journals, Books, etc)        808 
##  Documents                             5022 
##  Annual Growth Rate %                  -3.7 
##  Document Average Age                  8.98 
##  Average citations per doc             11.54 
##  Average citations per year per doc    1.014 
##  References                            134769 
##  
## DOCUMENT TYPES                     
##  article                         4940 
##  article; book chapter           6 
##  article; early access           7 
##  article; proceedings paper      69 
##  
## DOCUMENT CONTENTS
##  Keywords Plus (ID)                    10861 
##  Author's Keywords (DE)                11088 
##  
## AUTHORS
##  Authors                               15128 
##  Author Appearances                    23004 
##  Authors of single-authored docs       174 
##  
## AUTHORS COLLABORATION
##  Single-authored docs                  201 
##  Documents per Author                  0.332 
##  Co-Authors per Doc                    4.58 
##  International co-authorships %        33.43 
##  
## 
## Annual Scientific Production
## 
##  Year    Articles
##     2009      369
##     2010      401
##     2011      456
##     2012      492
##     2013      486
##     2014      526
##     2015      497
##     2016      496
##     2017      527
##     2018      512
##     2019      253
## 
## Annual Percentage Growth Rate -3.7 
## 
## 
## Most Productive Authors
## 
##    Authors        Articles Authors        Articles Fractionalized
## 1     GIMENEZ O         82     GIMENEZ O                    17.95
## 2     PRADEL R          65     ROYLE JA                     15.92
## 3     ROYLE JA          59     PRADEL R                     12.82
## 4     CHOQUET R         44     BOHNING D                    10.78
## 5     BARBRAUD C        40     CHOQUET R                     9.91
## 6     BESNARD A         38     BARBRAUD C                    9.12
## 7     TAVECCHIA G       34     WHITE GC                      7.84
## 8     ORO D             32     SCHAUB M                      7.78
## 9     NICHOLS JD        31     KING R                        7.69
## 10    SCHAUB M          29     BESNARD A                     7.51
## 
## 
## Top manuscripts per citations
## 
##                             Paper                                       DOI  TC TCperYear   NTC
## 1  CHOQUET R, 2009, ECOGRAPHY              10.1111/j.1600-0587.2009.05968.x 414      27.6 15.08
## 2  WHITEHEAD H, 2009, BEHAV ECOL SOCIOBIOL 10.1007/s00265-008-0697-y        350      23.3 12.75
## 3  LUIKART G, 2010, CONSERV GENET          10.1007/s10592-010-0050-7        289      20.6 13.89
## 4  GLANVILLE J, 2009, P NATL ACAD SCI USA  10.1073/pnas.0909775106          251      16.7  9.14
## 5  PATTERSON CC, 2012, DIABETOLOGIA        10.1007/s00125-012-2571-8        237      19.8 14.65
## 6  WALLACE BP, 2010, PLOS ONE              10.1371/journal.pone.0015465     207      14.8  9.95
## 7  GOMEZ P, 2011, SCIENCE                  10.1126/science.1198767          195      15.0 10.27
## 8  MERTES PM, 2011, J ALLERGY CLIN IMMUN   10.1016/j.jaci.2011.03.003       165      12.7  8.69
## 9  ROYLE JA, 2009, ECOLOGY                 10.1890/08-1481.1                158      10.5  5.76
## 10 SOMERS EC, 2014, ARTHRITIS RHEUMATOL    10.1002/art.38238                156      15.6 13.01
## 
## 
## Corresponding Author's Countries
## 
##           Country Articles   Freq  SCP MCP MCP_Ratio
## 1  USA                1784 0.3567 1454 330     0.185
## 2  AUSTRALIA           326 0.0652  202 124     0.380
## 3  FRANCE              318 0.0636  198 120     0.377
## 4  UNITED KINGDOM      318 0.0636  184 134     0.421
## 5  CANADA              304 0.0608  199 105     0.345
## 6  SPAIN               157 0.0314   95  62     0.395
## 7  ITALY               148 0.0296   89  59     0.399
## 8  GERMANY             146 0.0292   66  80     0.548
## 9  NEW ZEALAND         133 0.0266   74  59     0.444
## 10 BRAZIL              129 0.0258   95  34     0.264
## 
## 
## SCP: Single Country Publications
## 
## MCP: Multiple Country Publications
## 
## 
## Total Citations per Country
## 
##       Country      Total Citations Average Article Citations
## 1  USA                       21915                     12.28
## 2  FRANCE                     4422                     13.91
## 3  UNITED KINGDOM             4374                     13.75
## 4  AUSTRALIA                  3740                     11.47
## 5  CANADA                     3466                     11.40
## 6  GERMANY                    2003                     13.72
## 7  NEW ZEALAND                1931                     14.52
## 8  ITALY                      1585                     10.71
## 9  SWITZERLAND                1464                     23.24
## 10 SPAIN                      1429                      9.10
## 
## 
## Most Relevant Sources
## 
##                                    Sources        Articles
## 1  PLOS ONE                                            219
## 2  JOURNAL OF WILDLIFE MANAGEMENT                      170
## 3  ECOLOGY AND EVOLUTION                               116
## 4  ECOLOGY                                             101
## 5  BIOLOGICAL CONSERVATION                              99
## 6  JOURNAL OF ANIMAL ECOLOGY                            80
## 7  METHODS IN ECOLOGY AND EVOLUTION                     77
## 8  JOURNAL OF MAMMALOGY                                 73
## 9  JOURNAL OF APPLIED ECOLOGY                           72
## 10 NORTH AMERICAN JOURNAL OF FISHERIES MANAGEMENT       65
## 
## 
## Most Relevant Keywords
## 
##    Author Keywords (DE)      Articles Keywords-Plus (ID)     Articles
## 1     MARK-RECAPTURE              687      SURVIVAL               647
## 2     CAPTURE-RECAPTURE           460      CONSERVATION           525
## 3     SURVIVAL                    326      CAPTURE-RECAPTURE      497
## 4     CAPTURE-MARK-RECAPTURE      246      ABUNDANCE              494
## 5     ABUNDANCE                   173      POPULATION             491
## 6     POPULATION DYNAMICS         145      MARKED ANIMALS         404
## 7     DEMOGRAPHY                  140      SIZE                   371
## 8     DISPERSAL                   138      POPULATIONS            339
## 9     CONSERVATION                131      MARK-RECAPTURE         328
## 10    POPULATION SIZE             125      DYNAMICS               302

Visualize:

plot(x = results, k = 10, pause = FALSE)

The 100 most frequent cited manuscripts:

CR <- citations(M, field = "article", sep = ";")
cbind(CR$Cited[1:100])
##                                                                                               [,1]
## WHITE GC, 1999, BIRD STUDY, V46, P120                                                         1310
## BURNHAM K, 2002, MODEL SELECTION MULT                                                         1131
## LEBRETON JD, 1992, ECOL MONOGR, V62, P67, DOI 10.2307/2937171                                  835
## WILLIAMS B. K., 2002, ANAL MANAGEMENT ANIM                                                     546
## OTIS DL, 1978, WILDLIFE MONOGR, P1                                                             536
## JOLLY GM, 1965, BIOMETRIKA, V52, P225, DOI 10.1093/BIOMET/52.1-2.225                           368
## SEBER GAF, 1965, BIOMETRIKA, V52, P249                                                         320
## CHOQUET R, 2009, ECOGRAPHY, V32, P1071, DOI 10.1111/J.1600-0587.2009.05968.X                   313
## SEBER GA, 1982, ESTIMATION ANIMAL AB                                                           306
## KENDALL WL, 1997, ECOLOGY, V78, P563                                                           277
## BORCHERS DL, 2008, BIOMETRICS, V64, P377, DOI 10.1111/J.1541-0420.2007.00927.X                 265
## CORMACK RM, 1964, BIOMETRIKA, V51, P429, DOI 10.1093/BIOMET/51.3-4.429                         243
## POLLOCK KH, 1982, J WILDLIFE MANAGE, V46, P752, DOI 10.2307/3808568                            233
## EFFORD M, 2004, OIKOS, V106, P598, DOI 10.1111/J.0030-1299.2004.13043.X                        228
## PRADEL R, 1996, BIOMETRICS, V52, P703, DOI 10.2307/2532908                                     217
## KARANTH KU, 1998, ECOLOGY, V79, P2852                                                          214
## CASWELL H., 2001, MATRIX POPULATION MO                                                         203
## HUGGINS RM, 1989, BIOMETRIKA, V76, P133, DOI 10.1093/BIOMET/76.1.133                           203
## POLLOCK KH, 1990, WILDLIFE MONOGR, P1                                                          203
## SCHWARZ CJ, 1996, BIOMETRICS, V52, P860, DOI 10.2307/2533048                                   201
## PRADEL R, 2005, BIOMETRICS, V61, P442, DOI 10.1111/J.1541-0420.2005.00318.X                    196
## BROWNIE C, 1993, BIOMETRICS, V49, P1173, DOI 10.2307/2532259                                   195
## HOOK EB, 1995, EPIDEMIOL REV, V17, P243, DOI 10.1093/OXFORDJOURNALS.EPIREV.A036192             193
## CHOQUET R, 2009, ENVIRON ECOL STAT SE, V3, P845, DOI 10.1007/978-0-387-78151-8_39              185
## PRADEL R, 1997, BIOMETRICS, V53, P60, DOI 10.2307/2533097                                      185
## PLEDGER S, 2000, BIOMETRICS, V56, P434, DOI 10.1111/J.0006-341X.2000.00434.X                   171
## ROYLE JA, 2014, SPATIAL CAPTURE-RECAPTURE, P1                                                  166
## STEARNS SC, 1992, EVOLUTION LIFE HIST                                                          163
## BUCKLAND S. T, 2001, INTRO DISTANCE SAMPL                                                      162
## KERY M, 2012, BAYESIAN POPULATION ANALYSIS USING WINBUGS: A HIERARCHICAL PERSPECTIVE, P1       154
## BURNHAM K. P., 1998, MODEL SELECTION INFE                                                      147
## ROYLE JA, 2008, ECOLOGY, V89, P2281, DOI 10.1890/07-0601.1                                     146
## HUGGINS RM, 1991, BIOMETRICS, V47, P725, DOI 10.2307/2532158                                   143
## ROYLE J. A., 2008, HIERARCHICAL MODELIN                                                        139
## ROYLE JA, 2009, ECOLOGY, V90, P3233, DOI 10.1890/08-1481.1                                     125
## LEBRETON JD, 2002, J APPL STAT, V29, P353, DOI 10.1080/02664760120108638                       124
## MACKENZIE DI, 2002, ECOLOGY, V83, P2248, DOI 10.2307/3072056                                   123
## LEBRETON JD, 2009, ADV ECOL RES, V41, P87, DOI 10.1016/S0065-2504(09)00403-6                   120
## KARANTH KU, 1995, BIOL CONSERV, V71, P333, DOI 10.1016/0006-3207(94)00057-W                    118
## SAETHER BE, 2000, ECOLOGY, V81, P642, DOI 10.2307/177366                                       118
## MACKENZIE DI, 2006, OCCUPANCY ESTIMATION                                                       114
## KENDALL WL, 1995, BIOMETRICS, V51, P293, DOI 10.2307/2533335                                   113
## BURNHAM K.P., 2002, MODEL SELECTION INFE                                                       110
## HESTBECK JB, 1991, ECOLOGY, V72, P523, DOI 10.2307/2937193                                     106
## AMSTRUP SC, 2005, HANDBOOK OF CAPTURE-RECAPTURE ANALYSIS, P1                                   104
## BROOKS SP, 1998, J COMPUT GRAPH STAT, V7, P434, DOI 10.2307/1390675                            103
## SCHWARZ CJ, 1993, BIOMETRICS, V49, P177, DOI 10.2307/2532612                                   103
## WOODS JG, 1999, WILDLIFE SOC B, V27, P616                                                      103
## KENDALL WL, 1999, ECOLOGY, V80, P2517, DOI 10.1890/0012-9658(1999)0802517:ROCCRM2.0.CO         102
## GROSBOIS V, 2008, BIOL REV, V83, P357, DOI 10.1111/J.1469-185X.2008.00047.X                    101
## R CORE TEAM, 2015, R LANG ENV STAT COMP                                                        101
## WAITS LP, 2001, MOL ECOL, V10, P249, DOI 10.1046/J.1365-294X.2001.01185.X                      100
## CHAO A, 1987, BIOMETRICS, V43, P783, DOI 10.2307/2531532                                        99
## CHAO A, 2001, STAT MED, V20, P3123, DOI 10.1002/SIM.996.ABS                                     99
## LINK WA, 2003, BIOMETRICS, V59, P1123, DOI 10.1111/J.0006-341X.2003.00129.X                     99
## AKAIKE H., 1973, 2 INT S INF THEOR, P267, DOI DOI 10.1007/978-1-4612-1694-0_                    97
## BURNHAM KENNETH P., 1993, P199                                                                  97
## EFFORD MG, 2009, ENVIRON ECOL STAT SE, V3, P255, DOI 10.1007/978-0-387-78151-8_11               95
## R DEVELOPMENT CORE TEAM, 2011, R LANG ENV STAT COMP                                             94
## PRADEL R., 2005, ANIMAL BIODIVERSITY AND CONSERVATION, V28, P189                                93
## R CORE TEAM, 2013, R LANG ENV STAT COMP                                                         93
## GREENWOOD PJ, 1980, ANIM BEHAV, V28, P1140, DOI 10.1016/S0003-3472(80)80103-5                   92
## YIP PSF, 1995, AM J EPIDEMIOL, V142, P1047                                                      92
## PLEDGER S, 2003, BIOMETRICS, V59, P786, DOI 10.1111/J.0006-341X.2003.00092.X                    90
## KENDALL WL, 2002, ECOLOGY, V83, P3276                                                           89
## PAETKAU D, 2003, MOL ECOL, V12, P1375, DOI 10.1046/J.1365-294X.2003.01820.X                     89
## SOISALO MK, 2006, BIOL CONSERV, V129, P487, DOI 10.1016/J.BIOCON.2005.11.023                    89
## WILSON B, 1999, ECOL APPL, V9, P288, DOI 10.2307/2641186                                        89
## WAITS LP, 2005, J WILDLIFE MANAGE, V69, P1419, DOI 10.2193/0022-541X(2005)691419:NGSTFW2.0.CO   88
## ARNOLD TW, 2010, J WILDLIFE MANAGE, V74, P1175, DOI 10.2193/2009-367                            86
## PRITCHARD JK, 2000, GENETICS, V155, P945                                                        86
## R CORE TEAM, 2016, R LANG ENV STAT COMP                                                         86
## SILVER SC, 2004, ORYX, V38, P148, DOI 10.1017/S0030605304000286                                 86
## SOLLMANN R, 2011, BIOL CONSERV, V144, P1017, DOI 10.1016/J.BIOCON.2010.12.011                   86
## GAILLARD JM, 2000, ANNU REV ECOL SYST, V31, P367, DOI 10.1146/ANNUREV.ECOLSYS.31.1.367          85
## GAILLARD JM, 2003, ECOLOGY, V84, P3294, DOI 10.1890/02-0409                                     85
## R CORE TEAM, 2014, R LANG ENV STAT COMP                                                         84
## KARANTH KU, 2006, ECOLOGY, V87, P2925, DOI 10.1890/0012-9658(2006)872925:ATPDUP2.0.CO           83
## SPIEGELHALTER DJ, 2002, J ROY STAT SOC B, V64, P583, DOI 10.1111/1467-9868.00353                83
## ANDERSON DR, 1994, ECOLOGY, V75, P1780, DOI 10.2307/1939637                                     82
## GELMAN A, 2004, BAYESIAN DATA ANAL                                                              82
## STANLEY TR, 1999, ENVIRON ECOL STAT, V6, P197, DOI 10.1023/A:1009674322348                      82
## GELMAN A, 1992, STAT SCI, V7, P457, DOI DOI 10.1214/SS/1177011136                               81
## LUNN DJ, 2000, STAT COMPUT, V10, P325, DOI 10.1023/A:1008929526011                              81
## SCHWARZ CJ, 1999, STAT SCI, V14, P427                                                           81
## WILSON KR, 1985, J MAMMAL, V66, P13, DOI 10.2307/1380951                                        81
## KREBS CJ, 1999, ECOLOGICAL METHODOLO                                                            80
## PULLIAM HR, 1988, AM NAT, V132, P652, DOI 10.1086/284880                                        80
## FOSTER RJ, 2012, J WILDLIFE MANAGE, V76, P224, DOI 10.1002/JWMG.275                             78
## BESBEAS P, 2002, BIOMETRICS, V58, P540, DOI 10.1111/J.0006-341X.2002.00540.X                    77
## GAILLARD JM, 1998, TRENDS ECOL EVOL, V13, P58, DOI 10.1016/S0169-5347(97)01237-8                77
## MORRIS W. F., 2002, QUANTITATIVE CONSERV                                                        77
## SCHAUB M, 2004, ECOLOGY, V85, P2107, DOI 10.1890/03-3110                                        77
## LUKACS PM, 2005, MOL ECOL, V14, P3909, DOI 10.1111/J.1365-294X.2005.02717.X                     76
## MILLER CR, 2005, MOL ECOL, V14, P1991, DOI 10.1111/J.1365-294X.2005.02577.X                     76
## R DEVELOPMENT CORE TEAM, 2012, R LANG ENV STAT COMP                                             76
## REXSTAD E., 1991, USERS GUIDE INTERACT                                                          76
## WHITE G. C., 1982, CAPTURE RECAPTURE RE                                                         76
## EFFORD M. G., 2004, ANIMAL BIODIVERSITY AND CONSERVATION, V27, P217                             75
## HURVICH CM, 1989, BIOMETRIKA, V76, P297, DOI 10.2307/2336663                                    75

The most frequent cited first authors:

CR <- citations(M, field = "author", sep = ";")
cbind(CR$Cited[1:25])
##                         [,1]
## WHITE GC                1671
## LEBRETON JD             1254
## ROYLE JA                1249
## BURNHAM K               1144
## PRADEL R                1017
## KENDALL WL               919
## POLLOCK KH               891
## CHOQUET R                858
## NICHOLS JD               671
## R DEVELOPMENT CORE TEAM  648
## KARANTH KU               620
## WILLIAMS B K             602
## OTIS DL                  553
## CHAO A                   540
## SCHWARZ CJ               512
## R CORE TEAM              511
## SCHAUB M                 505
## SEBER GAF                488
## MACKENZIE DI             475
## BURNHAM K P              466
## BURNHAM KP               461
## KERY M                   449
## EFFORD MG                435
## GELMAN A                 399
## PLEDGER S                399

Top authors productivity over time:

topAU <- authorProdOverTime(M, k = 10, graph = TRUE)

2.4 Network results

Below is an author collaboration network, where nodes represent top 30 authors in terms of the numbers of authored papers in our dataset; links are co-authorships. The Louvain algorithm is used throughout for clustering:

M <- metaTagExtraction(M, Field = "AU_CO", sep = ";")
NetMatrix <- biblioNetwork(M, analysis = "collaboration", network = "authors", sep = ";")
net <- networkPlot(NetMatrix, n = 30, Title = "Collaboration network", type = "fruchterman", 
                   size = TRUE, remove.multiple = FALSE, labelsize = 0.7, cluster = "louvain")

Country collaborations:

NetMatrix <- biblioNetwork(M, analysis = "collaboration", network = "countries", sep = ";")
net <- networkPlot(NetMatrix, n = 20, Title = "Country collaborations", type = "fruchterman", 
                   size = TRUE, remove.multiple = FALSE, labelsize = 0.7, cluster = "louvain")

A keyword co-occurrences network:

NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "keywords", sep = ";")
netstat <- networkStat(NetMatrix)
summary(netstat, k = 10)
## 
## 
## Main statistics about the network
## 
##  Size                                  10867 
##  Density                               0.002 
##  Transitivity                          0.08 
##  Diameter                              6 
##  Degree Centralization                 0.192 
##  Average path length                   2.772 
## 
net <- networkPlot(NetMatrix, normalize = "association", weighted = T, n = 50, 
                   Title = "Keyword co-occurrences", type = "fruchterman", size = T,
                   edgesize = 5, labelsize = 0.7)

2.5 Textual analysis: Topic modelling on abstracts

To know everything about textual analysis and topic modelling in particular, we recommend the reading of Text Mining with R.

Clean and format the data:

wordfabs <- dat %>%
  mutate(line = row_number()) %>%
  filter(nchar(AB) > 0) %>%
  unnest_tokens(word, AB) %>%
  anti_join(stop_words) %>%
  filter(str_detect(word, "[^\\d]")) %>%
  group_by(word) %>%
  mutate(word_total = n()) %>%
  ungroup() 

desc_dtm <- wordfabs %>%
  count(line, word, sort = TRUE) %>%
  ungroup() %>%
  cast_dtm(line, word, n)

Perform the analysis, takes several minutes:

desc_lda <- LDA(desc_dtm, k = 20, control = list(seed = 42))
tidy_lda <- tidy(desc_lda)

Visualise results:

top_terms <- tidy_lda %>%
  filter(topic < 13) %>%
  group_by(topic) %>%
  top_n(10, beta) %>%
  ungroup() %>%
  arrange(topic, -beta)

top_terms %>%
  mutate(term = reorder(term, beta)) %>%
  group_by(topic, term) %>%    
  arrange(desc(beta)) %>%  
  ungroup() %>%
  mutate(term = factor(paste(term, topic, sep = "__"), 
                       levels = rev(paste(term, topic, sep = "__")))) %>%
  ggplot(aes(term, beta, fill = as.factor(topic))) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  scale_x_discrete(labels = function(x) gsub("__.+$", "", x)) +
  labs(title = "Top 10 terms in each LDA topic",
       x = NULL, y = expression(beta)) +
  facet_wrap(~ topic, ncol = 4, scales = "free")

ggsave('figs/topic_abstracts.png', width = 12, dpi = 600)

This is quite informative! Topics can fairly easily be interpreted: 1 is about estimating fish survival, 2 is about photo-id, 3 is general about modeling and estimation, 4 is disease ecology, 5 is about estimating abundance of marine mammals, 6 is about capture-recapture in (human) health sciences, 7 is about the conservation of large carnivores (tigers, leopards), 8 is about growth and recruitment, 9 about prevalence estimation in humans, 10 is about the estimation of individual growth in fish, 11 is (not a surprise) about birds (migration and reproduction), and 12 is about habitat perturbations.

3 Qualitative analyses: Making sense of the corpus

3.1 Motivation

Our objective was to make a list of ecological questions and methods that were addressed in these papers. The bibliometric and text analyses above were useful, but we needed to dig a bit deeper to achieve the objective. Here how we did.

3.2 Methodological papers

First, we isolated the methodological journals. To do so, we focused the search on journals that had published more than 10 papers about capture-recapture over the last 10 years:

raw_dat <- read_csv(file = 'data/crdat.csv')
raw_dat %>% 
  group_by(journal) %>%
  filter(n() > 10) %>%
  ungroup() %>%
  count(journal)

By inspecting this list, we ended up with these methodological journals:

methods <- raw_dat %>% 
  filter(journal %in% c('BIOMETRICS',
                        'ECOLOGICAL MODELLING',
                        'JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS',
                        'METHODS IN ECOLOGY AND EVOLUTION',
                        'ANNALS OF APPLIED STATISTICS',
                        'ENVIRONMENTAL AND ECOLOGICAL STATISTICS'))

methods %>%
  count(journal, sort = TRUE)

Now we exported the 219 papers published in these methodological journals in a csv file:

raw_dat %>% 
  filter(journal %in% c('BIOMETRICS',
                        'ECOLOGICAL MODELLING',
                        'JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS',
                        'METHODS IN ECOLOGY AND EVOLUTION',
                        'ANNALS OF APPLIED STATISTICS',
                        'ENVIRONMENTAL AND ECOLOGICAL STATISTICS')) %>%
  write_csv('data/papers_in_methodological_journals.csv')

The next step was to annotate this file to determine the methods used. R could not help, and we had to do it by hand. We read the >200 titles and abstracts and added our tags in an extra column. The task was cumbersome but very interesting. We enjoyed seeing what colleagues have been working on. The results are in this file.

By focusing the annotation on the methodological journals, we ignored all the methodological papers that had been published in other non-methodological journals like, among others, Ecology, Journal of Applied Ecology, Conservation Biology and Plos One which welcome methods. We address this issue below. In brief, we scanned the corpus of ecological papers and tagged all methodological papers (126 in total); we added them to the file of methodological papers and added a column to keep track of the paper original (methodological vs ecological corpus).

3.3 Ecological papers

Second, we isolated the ecological journals. To do so, we focused the search on journals that had been published more than 50 papers about capture-recapture over the last 10 years, and we excluded the methodological journals:

ecol <- raw_dat %>% 
  filter(!journal %in% c('BIOMETRICS',
                        'ECOLOGICAL MODELLING',
                        'JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS',
                        'METHODS IN ECOLOGY AND EVOLUTION',
                        'ANNALS OF APPLIED STATISTICS',
                        'ENVIRONMENTAL AND ECOLOGICAL STATISTICS')) %>%
  group_by(journal) %>%
  filter(n() > 50) %>%
  ungroup()

ecol %>% 
  count(journal, sort = TRUE)
ecol %>%
  nrow()
## [1] 1378
ecol %>%
  write_csv('data/papers_in_ecological_journals.csv')

Again, we inspected the papers one by one. We mainly focused the reading on the titles and abstracts. We did not annotate the papers.

4 Note

This work initially started as a talk we gave at the Wildlife Research and Conservation 2019 conference in Berlin. The slides can be downloaded here. There is also a video recording of the talk there, and a Twitter thread of it. We also presented a poster at the Euring 2021 conference, see here.

5 R version used

sessionInfo()
## R version 4.2.3 (2023-03-15)
## Platform: aarch64-apple-darwin20 (64-bit)
## Running under: macOS Monterey 12.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] topicmodels_0.2-14 tidytext_0.4.1     quanteda_3.3.1     bibliometrix_4.1.3 lubridate_1.9.2    forcats_1.0.0     
##  [7] stringr_1.5.0      dplyr_1.1.2        purrr_1.0.1        readr_2.1.4        tidyr_1.3.0        tibble_3.2.1      
## [13] ggplot2_3.4.2      tidyverse_2.0.0   
## 
## loaded via a namespace (and not attached):
##   [1] TH.data_1.1-2          colorspace_2.1-0       ellipsis_0.3.2         modeltools_0.2-23      estimability_1.4.1    
##   [6] rstudioapi_0.14        farver_2.1.1           dimensionsR_0.0.3      rscopus_0.6.6          SnowballC_0.7.1       
##  [11] bit64_4.0.5            ggrepel_0.9.3          DT_0.28                fansi_1.0.4            mvtnorm_1.2-2         
##  [16] xml2_1.3.4             codetools_0.2-19       splines_4.2.3          leaps_3.1              cachem_1.0.8          
##  [21] knitr_1.43             jsonlite_1.8.5         cluster_2.1.4          shiny_1.7.4            rentrez_1.2.3         
##  [26] compiler_4.2.3         httr_1.4.6             emmeans_1.8.5          Matrix_1.5-4           fastmap_1.1.1         
##  [31] lazyeval_0.2.2         cli_3.6.1              later_1.3.1            htmltools_0.5.5        tools_4.2.3           
##  [36] NLP_0.2-1              igraph_1.4.2           coda_0.19-4            gtable_0.3.3           glue_1.6.2            
##  [41] reshape2_1.4.4         FactoMineR_2.8         fastmatch_1.1-3        Rcpp_1.0.10            slam_0.1-50           
##  [46] cellranger_1.1.0       jquerylib_0.1.4        vctrs_0.6.3            xfun_0.39              stopwords_2.3         
##  [51] openxlsx_4.2.5.2       timechange_0.2.0       mime_0.12              lifecycle_1.0.3        XML_3.99-0.14         
##  [56] stringdist_0.9.10      bibliometrixData_0.3.0 MASS_7.3-59            zoo_1.8-12             scales_1.2.1          
##  [61] vroom_1.6.3            ragg_1.2.5             hms_1.1.3              promises_1.2.0.1       parallel_4.2.3        
##  [66] sandwich_3.0-2         yaml_2.3.7             sass_0.4.6             stringi_1.7.12         highr_0.10            
##  [71] tokenizers_0.3.0       zip_2.3.0              systemfonts_1.0.4      rlang_1.1.1            pkgconfig_2.0.3       
##  [76] evaluate_0.21          lattice_0.21-8         labeling_0.4.2         htmlwidgets_1.6.2      bit_4.0.5             
##  [81] tidyselect_1.2.0       plyr_1.8.8             magrittr_2.0.3         R6_2.5.1               generics_0.1.3        
##  [86] multcompView_0.1-9     multcomp_1.4-23        pillar_1.9.0           withr_2.5.0            survival_3.5-5        
##  [91] scatterplot3d_0.3-44   crayon_1.5.2           janeaustenr_1.0.0      utf8_1.2.3             plotly_4.10.1         
##  [96] tzdb_0.4.0             rmarkdown_2.22         grid_4.2.3             readxl_1.4.2           data.table_1.14.8     
## [101] digest_0.6.31          flashClust_1.01-2      tm_0.7-11              xtable_1.8-4           httpuv_1.6.11         
## [106] textshaping_0.3.6      RcppParallel_5.1.7     stats4_4.2.3           munsell_0.5.0          viridisLite_0.4.2     
## [111] pubmedR_0.0.3          bslib_0.5.0