Estimating abundance

class: center, middle, title-slide

# Estimating abundance
### Aurélien Besnard for the team
### last updated: 2022-03-18

---

## Population dynamics

---

---

# Background - Detectability issues

Counting animals or plants in the field
<img src="img/Diapositive1.JPG" width="75%" style="display: block; margin: auto;" />

---

# Background - Detectability issues
<img src="img/Diapositive2.JPG" width="75%" style="display: block; margin: auto;" />

Detectability is rarely (never) exhaustive

---

# Background - Detectability issues

<img src="img/Diapositive3.JPG" width="75%" style="display: block; margin: auto;" />
---

# Background - Detectability issues

Detectability (usually) varies over space

---

# Background - Detectability issues

---
# Background - Detectability issues

Detectability (usually) varies over time

---
# Background - Detectability issues

+ If `$N$` is the abundance of a particular species at a particular study area

+ And `$C$` the number of individuals counted in the field in this area

+ We have `$E(C)=pN$`

+ Where `$p$` is the probability of counting an individual in the study area

---
# Background - Detectability issues

+ Eventually population abundance is given by:

`$$N = \frac{C}{\alpha \times p}$$`

+ `$\alpha$` being the fraction of the study area that is sampled

+ *p* needs to be estimated to provide unbiased estimates

+ And allows inference regarding abundance variations over time and space

---
# Sampling design

+ How to select sampling units in the study area to obtain unbiased estimates of abundance is a complex task

+ Need to define the statistical population of sample units and rely on some random selection process of the units to survey

+ Several sampling designs exist: Random, systematic, stratified, spatially balanced, etc

+ Not covered in this course

---
# Some useful reading on sampling design

.pull-left[
<img src="img/book_williams.jpg" width="70%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="img/book_thompson.jpg" width="60%" style="display: block; margin: auto;" />
]

---
# Issue 1: Estimating abundance when `$p<1$`

+ Several methods exist, depending on whether population is "closed" or "open"

+ Closed populations: Capture-recapture, Distance sampling, N-mixture

+ Open populations: Capture-recapture, open N-mixture

+ In this course: 
 - Capture-recapture for closed populations
 - Distance sampling
 - N-mixture for closed populations

---
## What does "closed" populations mean?

+ Demographic closure:
  - no birth
  - no death
  
+ Geographic closure:
  - no immigration
  - no emigration
  
+ Between first and last field sessions

+ Implies to work over short time intervals

+ 'Short' depending on species life history traits

---
## What does "closed" populations mean?

.pull-left[
<img src="img/photo_tortue.jpg" width="100%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="img/photo_papillon.jpg" width="100%" style="display: block; margin: auto;" />
]

---
class: center, middle
background-size: cover

# Capture-recapture for closed populations

---
## Capture-recapture methods for closed populations

Sample in a closed population
<img src="img/Diapositive7.JPG" width="75%" style="display: block; margin: auto;" />

---
## Capture-recapture methods for closed populations

Capture, mark and release some individuals (noted `$n_1$`)
<img src="img/Diapositive8.JPG" width="75%" style="display: block; margin: auto;" />
`$p$` is typically below 1, not all present individuals are captured (here `$p \approx 1/3$`)
---
## Capture-recapture methods for closed populations

Make a second visit, capture a second independent sample of individuals
<img src="img/Diapositive10.JPG" width="75%" style="display: block; margin: auto;" />
Among newly captured individuals, noted `$n_2$`, some are marked, noted `$m$`, others aren't

---
## Capture-recapture methods for closed populations

+ The proportion of marked individuals after the first sample is `$\displaystyle{\frac{n_1}{N}}$`

+ The proportion of marked individuals in the second sample is `$\displaystyle{\frac{m}{n_2}}$`

+ Assuming sampling sessions are independent (individuals have equal detection probability whether they have already been captured or not)

+ Then the proportion of marked individuals in the second sample, *m*, is the proportion of marked individuals in the entire population:

`$$\hat{N}=\displaystyle{\frac{n_1 \times n_2}{m}}$$`
---
## Capture-recapture methods for closed populations

Another way of writing the Lincoln-Petersen index

`$n_{10}=N \times p_1 \times (1-p_2)$`

`$n_{01}=N \times (1-p_1) \times p_2$`

`$n_{11}=N \times p_1 \times p_2$`

with `$p_1$` and `$p_2$` the capture probability at first and second session respectively

3 equations, 3 unknown values N, `$p_1, p_2, \ldots$`, no difficulties

---
## Capture-recapture methods for closed populations

Rearranging the equations also leads to:

`$$N=\displaystyle{\frac{n_{10}+n_{01}+n_{11}}{1 - (1-p_1) \times (1-p_2)}}$$`
or

`$$N=\displaystyle{\frac{n_{10}+n_{01}+n_{11}}{p*}}$$`

with `$p*$` the probability to be captured at least once and `${n_{10}+n_{01}+n_{11}}$` the total number of different individuals captured

*Remember* `$$N = \frac{C}{p}$$`

---
## The Lincoln-Petersen index

.pull-left[
Petersen (fishes 1894)
<img src="img/photo_petersen.jpg" width="75%" style="display: block; margin: auto;" />
]

.pull-right[
Lincoln (birds 1930)
<img src="img/photo_Lincoln.jpg" width="75%" style="display: block; margin: auto;" />
]

---
## Lincoln-Petersen index requires individual identification

+ Usually implies some marking

+ Group marking can also works

---
## Lincoln-Petersen index requires individual identification

Physical alteration of individuals

<img src="img/Diapositive36.JPG" width="75%" style="display: block; margin: auto;" />
---
## Lincoln-Petersen index requires individual identification

Add some tags

---
## Lincoln-Petersen index requires individual identification

Natural marks (including DNA)

<img src="img/Diapositive38.JPG" width="75%" style="display: block; margin: auto;" />
---
# Lincoln-Petersen assumptions 1

Population is closed demographically and geographically

<img src="img/Diapositive11.JPG" width="70%" style="display: block; margin: auto;" />
Population is estimated to 30 individuals (supra-population ?)

---
# Lincoln-Petersen assumptions 2

Population is closed demographically and geographically

**All individuals are equally likely to be caught in each sample**

`$n_1 = N \times p_1$`, `$n_2 = N \times p_2$`, `$m = N \times p_1 \times p_2$`

---
# Lincoln-Petersen assumptions 3

Population is closed demographically and geographically

All individuals are equally likely to be caught in each sample

**Marks are not lost (or unobserved)**

*m* is under-estimated and then `$\hat{N}$` over-estimated

---
# Lincoln-Petersen and then ?

Population is closed demographically and geographically

**All individuals are equally likely to be caught in each sample**

Marks are not lost (or unobserved)

Several sources of variation in capture:
  - time - t (weather, sampling effort)
  - behavior - b (trap-effect)
  - individual heterogenity - h (age, sex, social status)
  
... and combination of these effects

To test these effects, we need more than 2 sessions

---
# More than two sessions - capture histories

Each line = one individual / each column = one field session
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> History </th>
   <th style="text-align:left;"> Comment </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 1011 </td>
   <td style="text-align:left;"> seen first session, not seen second session, seen third and fourth session </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 0110 </td>
   <td style="text-align:left;"> not seen first session, seen second and third session, not seen fourth session </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1010 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> ... </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 1101 </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>
---
# More than two sessions - assumptions

Population is closed demographically and geographically

Marks are not lost and individuals are identified without error

All individuals are equally likely to be caught in each sample

---
# More than two sessions - capture histories

The `$M_0$` model - constant detection probability (among individuals, among sessions)

Generalisation of Lincoln-Petersen index to more than two sessions

|History |Probability     |
|:-------|:---------------|
|111     |ppp             |
|110     |pp(1-p)         |
|101     |p(1-p)p         |
|100     |p(1-p)(1-p)     |
|011     |(1-p)pp         |
|010     |(1-p)p(1-p)     |
|001     |(1-p)(1-p)p     |
|000     |(1-p)(1-p)(1-p) |

---
# Maximum likelihood

+ The number of individuals with history 111 is provided by `$n_{111}=N \times p \times p \times p$`

+ We can do that for all histories, remember the equations we wrote for the Lincoln-Petersen index

+ Solving the equation system becomes more and more challenging as the number of field sessions increases

+ Then using an algorithm called "maximum likelihood" we can search for the values of p and N that lead to the closest match between the number of observed histories ( `$n_{111}$`, `$n_{110}$`, etc) and their expected number given `$\hat{p}$` and `$\hat{N}$`

---
# More than two sessions

+ The `$M_0$` model is very constrained, possible to relax some assumptions

+ `$M_t$` = Variation over time of detection probabilities

+ `$M_b$` = Capture and recapture can be different (trap-dependence)

---
# Model Mt

+ Capture probabilities vary with session (weather conditions, field effort, observer experience, etc)

+ With `$K$` sessions, `$K+1$` parameters to estimate: `$N, p_1, p_2, \ldots, p_K$`

---
# Model Mt

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> History </th>
   <th style="text-align:left;"> M0 </th>
   <th style="text-align:left;"> Mt </th>
   <th style="text-align:left;"> Mb </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 111 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p1p2p3 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 110 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p1p2(1-p3) </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 101 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p1(1-p2)p3 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 100 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p1(1-p2)(1-p3) </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 011 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p1)p2p3 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 010 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p1)p2(1-p3) </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 001 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p1)(1-p2)p3 </td>
   <td style="text-align:left;">  </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 000 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p1)(1-p2)(1-p3) </td>
   <td style="text-align:left;">  </td>
  </tr>
</tbody>
</table>

---
# Model Mb

+ Probability of first capture `$p$` differs from probability of recapture `$c$`

+ Individuals can become trap-shy with `$c<p$` or trap-happy with `$c>p$`

+ 3 parameters to be estimated: `$N, p, c$`

---
# Model Mb

<table>
 <thead>
  <tr>
   <th style="text-align:left;"> History </th>
   <th style="text-align:left;"> M0 </th>
   <th style="text-align:left;"> Mt </th>
   <th style="text-align:left;"> Mb </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 111 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> pcc </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 110 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> pc(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 101 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p(1-c)c </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 100 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> p(1-c)(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 011 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p)pc </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 010 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p)p(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 001 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p)(1-p)p </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 000 </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;">  </td>
   <td style="text-align:left;"> (1-p)(1-p)(1-p) </td>
  </tr>
</tbody>
</table>

---
# All models
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> History </th>
   <th style="text-align:left;"> M0 </th>
   <th style="text-align:left;"> Mt </th>
   <th style="text-align:left;"> Mb </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 111 </td>
   <td style="text-align:left;"> ppp </td>
   <td style="text-align:left;"> p1p2p3 </td>
   <td style="text-align:left;"> pcc </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 110 </td>
   <td style="text-align:left;"> pp(1-p) </td>
   <td style="text-align:left;"> p1p2(1-p3) </td>
   <td style="text-align:left;"> pc(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 101 </td>
   <td style="text-align:left;"> p(1-p)p </td>
   <td style="text-align:left;"> p1(1-p2)p3 </td>
   <td style="text-align:left;"> p(1-c)c </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 100 </td>
   <td style="text-align:left;"> p(1-p)(1-p) </td>
   <td style="text-align:left;"> p1(1-p2)(1-p3) </td>
   <td style="text-align:left;"> p(1-c)(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 011 </td>
   <td style="text-align:left;"> (1-p)pp </td>
   <td style="text-align:left;"> (1-p1)p2p3 </td>
   <td style="text-align:left;"> (1-p)pc </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 010 </td>
   <td style="text-align:left;"> (1-p)p(1-p) </td>
   <td style="text-align:left;"> (1-p1)p2(1-p3) </td>
   <td style="text-align:left;"> (1-p)p(1-c) </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 001 </td>
   <td style="text-align:left;"> (1-p)(1-p)p </td>
   <td style="text-align:left;"> (1-p1)(1-p2)p3 </td>
   <td style="text-align:left;"> (1-p)(1-p)p </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 000 </td>
   <td style="text-align:left;"> (1-p)(1-p)(1-p) </td>
   <td style="text-align:left;"> (1-p1)(1-p2)(1-p3) </td>
   <td style="text-align:left;"> (1-p)(1-p)(1-p) </td>
  </tr>
</tbody>
</table>

---
# Model with heterogeneity

+ Capture can depend on several unknown variables (age, sexe, social status, etc)

+ Each individual as a unique capture probability

+ `$N+1$` parameters to estimate: `$N, p_i$` with `$i=1, 2, \ldots, N$`

+ This model has too many parameters, we can use Pledger's mixture models

---
# Pledger's mixture model

+ Assume individuals belong to two groups with different capture probabilities

+ A proportion `$\pi$` belongs to the first group with capture probability `$p_H$`

+ And `$(1-\pi)$` to the second group with `$p_L$`

+ Then `$p=\pi \times p_H + (1-\pi) \times p_L$`

+ We can just use this equation in `$M_0$` and get four parameters: `$N, \pi, p_H$` and `$p_L$`

---
# More complex models

+ Combining different effects: `$M_{bt}, M_{bh}, M_{th}$`

+ Use some individuals covariates: Age, sex, body conditions, etc

---
# Model selection

+ We can fit several models, each returning a different `$\hat{N}$`

+ How to select for the one that is the best given the data ?

+ The most complex models are always the one that explain best the data (increase explained variance)

+ Yet sometimes the complexification yields to only very small improvement of explained variance

+ Idea: **penalize models with too many parameters**

---
# Akaike information criterion (AIC)

`$$AIC = Deviance + 2 K$$`

+ with `$Deviance$`, a measure of the unexplained variance of the model

+ `$K$` the number of parameters in the model

+ AIC makes the balance between *quality of fit* and *complexity* of a model

+ Best model is the one with lowest AIC value

*Note that two models are difficult to distinguish if `$\Delta \text{AIC} < 2$`*

---
# Some thoughts on closure assumption

+ Some tests of closure exist (Otis et al 1978, Stanley and Burnham 1999), available in R (package "secr")

+ Yet very sensitive to other sources of variations such as capture heterogeneity

+ Low performance when p is low

+ Some authors now rely more on ecological justification than statistical tests

---
# Cited references

Huggins R. M. (1989). On the statistical analysis of capture-recapture experiments.  *Biometrika* 76:133-140.

Otis, D. L., Burnham, K. P., White, G. C. and Anderson, D. R. (1978). Statistical inference from capture data on closed animal populations. *Wildlife Monographs* 62: 1–135.

Pledger S. (2000). Unified Maximum Likelihood Estimates for Closed Capture–Recapture Models Using Mixtures. *Biometrics* 56: 434-442.

Stanley, T. R. and Burnham, K. P. (1999). A closure test for time-specific capture–recapture data. *Environmental and Ecological Statistics* 6:197–209.

---
# Further reading

.pull-left[
<img src="img/book_williams.jpg" width="70%" style="display: block; margin: auto;" />
]

.pull-right[

http://www.phidot.org/software/mark/docs/book/
<img src="img/book_mark.jpg" width="100%" style="display: block; margin: auto;" />
]

---
background-color: #234f66
## <span style="color:white">Live demo in R</span>

.center[
![](img/b5b086f9cc403008ba7be5dd508cfed2.gif)
]

---
class: center, middle
background-size: cover

# Distance sampling

---

# Count individuals along strip transects

Traditional methods = count animals (or plants) along transects of fixed width
<img src="img/Diapositive12.JPG" width="80%" style="display: block; margin: auto;" />

---
# Count individuals along strip transects

Assume all individuals in the strip are detected and counted
<img src="img/Diapositive13.JPG" width="80%" style="display: block; margin: auto;" />

---
# Count individuals inside a circular area

The survey area can also be a circle = circular plot

---
# Count individuals inside a circular area

Assume all individuals in the circle are detected and counted

---
## Estimating density and abundance using counts (1)

+ Using raw counts in the sample strips/circles

+ Density is easily estimated using:

`$$\hat{D}=\frac{\sum_{i=1}^ny_i}{\sum_{i=1}^n{a_i}}$$`

+ With `$y_i$` the count at each sample unit, and `$a_i$` the surface of each sample unit

+ That can be translated into abundance `$N$` with: 
`$$\hat{N} = \frac{\sum_{i=1}^ny_i}{\alpha}$$`

+ Where `$\alpha$` is the the proportion of the study area eventually sampled

---
## Estimating density and abundance using counts (2)

`$L$` = transect length, `$w$` = transect width, `$y$` = number of birds detected

`$$\hat{D}=\frac{\sum_{i=1}^ny_i}{\sum_{i=1}^nL_i \times w_i}$$`

---
## Estimating density and abundance using counts (3)

`$w$` = radius, `$y$` = number of birds detected

`$$\hat{D}=\frac{\sum_{i=1}^ny_i}{\sum_{i=1}^n\pi \times w_i^2}$$`

---
# Assumptions

+ All individuals in the sample area are detected and counted

+ Yet detection probability is (always) below 1 (see lecture on detection issue)

+ It can vary between sample units and even in sample units (depending on vegetation cover for instance)

---
# Distance sampling - background (1)

+ Distance sampling rational was formulated by Steve Buckland, early 1980

+ The number of detections tends to decrease with distance to observer...

+ ...some information on detection probability is encapsulated in distances

+ Can we estimate detection probability by recording the distances of detected individuals rather than their presence only ?

---
# Distance sampling - background (2)

+ Observations along a transect

+ Record perpendicular distances to transects, assume not all individuals are detected
<img src="img/Diapositive16.JPG" width="80%" style="display: block; margin: auto;" />

---
# Distance sampling - background (3)

Observations along several transects

---
# Distance sampling - background (4)

Observations from a fixed location

Record radial distances to the point, assume not all individuals are detected

---
# Distance sampling - background (5)

Observations from several fixed locations

---
# Estimating density when detection < 1

* Remember first lecture on detection issue, but also Capture-recapture: `$N = \frac{C}{p}$`

`$$\hat{D}=\frac{\sum_{i=1}^ny_i}{a \times \hat{p}}$$`

+ With `$p$` the mean detection probability on the entire sampled area of surface `$a$`

+ How to estimate `$p$` using distance data ?

---
## Recorded data

+ Distance sampling recorded data typically look like:

+ If `$p$` was equal to one, the histogram would be flat

+ The decrease results from a decreasing detection prob with distance to observer

---
# Estimating detection with distance data (1)

Some individuals are detected, some are not

---
# Estimating detection with distance data (2)

For instance, if 100 individuals are observed, 70% of the individuals are detected (below the curve) and 30% not detected

`$$\hat{N} = \frac{100}{0.70} = 143$$`

---
# How to extract *p* from these data ?

+ We need to estimate `$p$`

+ To do that, we need to fit a detection curve to the raw data

<img src="img/Diapositive20.JPG" width="80%" style="display: block; margin: auto;" />
---
# How to extract *p* from these data ?

.pull-left[
<img src="img/Diapositive21.JPG" width="90%" style="display: block; margin: auto;" />
]

.pull-right[

`$\displaystyle{\hat{p}=\frac{\int_{0}^{w} \hat{g}(y)\,dy}{w}}$`

with `$w$` the distance at which data are truncated and `$g(y)$` the detection curve

it is the `$\frac{\mbox{area under the curve}}{\mbox{area of the rectangle}}$` 
]

---
# Modelling the detection function (1)

+ `$g(y)$`, detection probability given the distance, needs to be estimated

+ `$g(y)$` is not known and may vary a lot due to observer experience and environment

+ Distance sampling strategy is to go for a few models for `$g(y)$` with good properties:

- model robustness
    - shape criterion
    - efficiency

---
# Modelling the detection function (2)

+ Model robustness

- General function that can take a variety of shapes

- Robust w.r.t. pooling

+ Shape criterion

- A “shoulder” near the transect line (detection approx. 1 around the line / point)

+ Efficiency

- Provides precise estimates – use Maximum Likelihood

---
# Modelling the detection function (3)

Two components for `$g(y)$`:

- A ‘key’ function: Uniform, half-normal, hazard-rate

- A ‘series’ expansion: Cosine, simple polynomial, hermite polynomial

---
# Fit a detection curve to the raw data

---
# Uniform *key*

---
# Half-normal *key*

---
# Half-normal *key*

Changing `$\sigma$` allows changing tail length

---
# Hazard-rate *key*

---
# Hazard-rate *key*

Changing `$b$` allow changing shoulder and tail

<img src="img/Diapositive27.JPG" width="80%" style="display: block; margin: auto;" />
---
# Series expansion

- Cosine `$\sum a_j \cos (\frac{j \pi y}{w})$`

- Simple polynomial `$\sum a_j (\frac{y}{w})^2j$`

- Hermite polynomial `$\sum a_j H_2j (\frac{y}{\sigma})$`

---
# Series expansion

Uniform key function + single cosine adjustment term

---
# Three critical assumptions

- Individuals on the line or point are all detected `$g(0)=1$`

- Individuals are detected at their initial location

- Distances are recorded accurately

---
# Individuals on the line or point are all detected

If `$g(0)$` is below 1 and unknown, density is underestimated by an unknown factor

---
# Individuals on the line or point are all detected

+ Possible to estimate `$g(0)$` using two or more independent observers or radiotracked individuals

+ Ensure `$g(0)=1$` using cameras, trained pointing dogs, but be aware of potential bias

+ Use more recent modelling approach, e.g. distance sampling with temporary emigration (requires repeated surveys in the same sampling unit)

---
## Individuals are detected at their initial location

+ Assumption 2: Animals do not move before detection (snapshot)

+ Effect of violation: 
     
     - Random movements induce positive bias; provided object movement is slow relative to movement of the observer, bias is small
     - Responsive movements can cause large bias; positive if there is attraction to the observer, negative if there is avoidance

+ Recommendations: 
    
    - In point transects, wait once on site before proceeding
    - In line transects, look well ahead

---
# Measures are recorded accurately

+ Assumption 3: Distances are measured accurately

+ Effect of violation: 
      
      - The estimator is fairly robust to random errors in measurement
      - It is sensitive to extreme outliers and to rounding distances

+ Recommendations:

- If exact measurements are difficult, use intervals (group data)
     - If outliers, use truncation (5-10% of the largest observations)
     - If heaping (convenient ‘rounding’), use intervals and choose cutpoints such that “heaps” are at the midpoint of an interval
     - Accurate measurement is most effective solution, so use appropriate tools (tape measures; laser range finders; compass for angles)

---
# Goodness-of-fit tests

+ We need to verify that the fit of the model makes sense

+ Several diagnostics and tests have been developed (QQplot, Chi-square tests, etc), depends on whether data are continuous or in classes of distance

+ The general principle is to verify that there is no large decrepancy between the prediction from the fitted model and the raw data

---
# Line or point transects ?

Point transects are:
- Relevant for populations distributed in patches
- Convenient to use when the area is difficult to access / survey
- Naturally suited for stratification

Line transects are:
- Efficient for sparsely distributed populations
- Effective in low densities

---
# Further readings

---
background-color: #234f66
## <span style="color:white">Live demo in R</span>

.center[
![](img/b5b086f9cc403008ba7be5dd508cfed2.gif)
]

---
class: center, middle
background-size: cover

# N-mixture

---

# Count on a fixed sample area

+ Traditional methods = count animals (or plants) along transects of fixed width
or on quadrat, or on circular plots

+ These counts are affected by detection issues
<img src="img/Diapositive12.JPG" width="75%" style="display: block; margin: auto;" />

---
# Repeated counts at the same sample unit
<img src="img/Diapositive30.JPG" width="80%" style="display: block; margin: auto;" />
True abundance `$N=10$`

---
# Repeated counts at the same sample unit
<img src="img/Diapositive31.JPG" width="80%" style="display: block; margin: auto;" />
True abundance `$N=10$`, count at first visit is 4

---
# Repeated counts at the same sample unit
<img src="img/Diapositive32.JPG" width="80%" style="display: block; margin: auto;" />
True abundance `$N=10$`, count at second visit is 3

---
# Repeated counts at the same sample unit
<img src="img/Diapositive33.JPG" width="80%" style="display: block; margin: auto;" />
True abundance `$N=10$`, count at third visit is 6

---
# Repeated counts with imperfect detection

+ When counts are repeated on the same sampling unit, some variations are observed

+ This is due to random process related to imperfect detection

+ Example: sex-ratio in humans is close to 0.5

- Probability to have a son is therefore 0.5
    - Expected number of sons in four siblings is 2
    - Yet some parents have four sons and zero daughters

---
# Repeated counts with imperfect detection

+ Example: true abundance is 5, detection probability is 0.5

+ Repeated counts on the same unit with imperfect detection leads to variations

+ Repeated counts on the same units encapsulate information on detection and abundance

---
# N-mixture models (Royle 2004)

+ N-mixture models take advantage of these variations between counts at the same sample unit

+ Requires some spatial replications (several sampling units) and temporal replications (repeated counts at the same units) - same design as *site occupancy models*

+ Some strong assumptions:

- Abundance at a sampling unit does not change between successive counts ("closed population")
    - Individuals have the same detection probability (no heterogeneity)
    - Abundances at the sampling units are Poisson distributed
    - Sampling sessions are independent (no trap-dependence)

---
# N-mixture models (Royle 2004)

Typical dataset used for N-mixture models

<table>
 <thead>
  <tr>
   <th style="text-align:right;"> site </th>
   <th style="text-align:right;"> counts.1 </th>
   <th style="text-align:right;"> counts.2 </th>
   <th style="text-align:right;"> counts.3 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 6 </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 7 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>

---
# N-mixture models formulation

+ N-mixture can be formulated as hierarchical models

+ A state process describing the system

`$$N_i \sim \mbox{Poisson}(\lambda)$$`
with `$\lambda$` the mean abundance at the sites

+ An observation process corresponding to detection issues

`$$y_{it} \sim \mbox{Binomial}(N_i,p)$$`

+ Covariates can be added on abundance (e.g. vegetation cover, elevation) and on detection probability (e.g. vegetation cover, date, observer experience)

---
# N-mixture models - to use with caution

+ The method has been highly controversial for years

+ Couturier et al. 2013 - Identifiability problems when `$p$` is low or highly variable

+ Barker et al. 2017 - Without auxillary information about `$p$`, count data cannot distinguish between N-mixture model or other possible models of `$N$`

+ Dennis 2015, Kéry 2017 - Some parameters not identifiable especially with negative binomial distribution instead of Poisson

+ Link et al. 2018 - Estimates sensitive to violation of double counting and constant `$\lambda / p$` - GOF unable to detect this

+ Conclusion: check parameter `$K$` (see live demo), perform GOF, do not rely on negative binomial distribution

---
# N-mixture models - further reading

+ Open N-mixture - Dail and Madsen 2011 / Hostetler and Chandler 2015

+ Generalized Distance sampling - Accomodates for temporary emigration - Chandler (2011)

+ Community N-mixture models - Yamaura et al. (2012)

---
# Cited references (1)

Barker, R.J., Schofield, M.R., Link, W.A. and Sauer, J.R. (2018). On the reliability of N-mixture models for count data. *Biometrics* 74: 369-377.

Chandler, R.B., Royle, J.A. and King, D.I. (2011). Inference about density and temporary emigration in unmarked populations. *Ecology* 92: 1429-1435.

Couturier, T., Cheylan, M., Bertolero, A., Astruc, G. and Besnard, A. (2013). Estimating abundance and population trends when detection is low and highly variable: A comparison of three methods for the Hermann's tortoise. *Journal of Wildlife Management* 77: 454-462.

Dail, D. and Madsen, L. (2011). Models for Estimating Abundance from Repeated Counts of an Open Metapopulation. *Biometrics* 67: 577-587.

Dennis, E.B., Morgan, B.J. and Ridout, M.S. (2015). Computational aspects of N-mixture models. *Biometrics* 71: 237-246.

---
# Cited references (2)

Hostetler, J.A. and Chandler, R.B. (2015). Improved state-space models for inference about spatial and temporal variation in abundance from count data. *Ecology* 96: 1713-1723.

Kéry, M. (2018). Identifiability in N-mixture models: a large-scale screening test with bird data. *Ecology* 99: 281-288.

Link, W.A., Schofield, M.R., Barker, R.J. and Sauer, J.R. (2018). On the robustness of N-mixture models. *Ecology* 99: 1547-1551.

Royle, J.A. (2004). N-Mixture Models for Estimating Population Size from Spatially Replicated Counts. *Biometrics* 60: 108-115.

Yamaura, Y., Royle, J.A., Shimada, N. et al. (2012). Biodiversity of man-made open habitats in an underused country: a class of multispecies abundance models for count data. *Biodivers Conserv* 21, 1365–1380.

---
background-color: #234f66
## <span style="color:white">Live demo in R</span>

.center[
![](img/b5b086f9cc403008ba7be5dd508cfed2.gif)
]