# 3 Hidden Markov models

## 3.1 Introduction

In this third chapter, you will learn the basics on Markov models and how to fit them to longitudinal data using NIMBLE. In real life however, individuals may go undetected and their status be unknown. You will also learn how to manipulate the extension of Markov models to hidden states, so-called hidden Markov models.

## 3.2 Longitudinal data

Let’s get back to our survival example, and denote $$z_i$$ the state of individual $$i$$ with $$z_i = 1$$ if alive and $$z_i = 0$$ if dead. We have a total of $$z = \displaystyle{\sum_{i=1}^{n}{z_i}}$$ survivors out of $$n$$ released animals with winter survival probability $$\phi$$. Our model so far is a combination of a binomial likelihood and a Beta prior with parameters 1 and 1, which is also a uniform distribution between 0 and 1. It can be written as:

\begin{align*} z &\sim \text{Binomial}(n, \phi) &\text{[likelihood]} \\ \phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\ \end{align*}

Because the binomial distribution is just a sum of independent Bernoulli outcomes, you can rewrite this model as:

\begin{align*} z_i &\sim \text{Bernoulli}(\phi), \; i = 1, \ldots, N &\text{[likelihood]} \\ \phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\ \end{align*}

It is like flipping a coin for each individual and get a survivor with probability $$\phi$$.

In this set up, we consider a single winter. But for many species, we need to collect data on the long term to get a representative estimate of survival. Therefore what if we had say $$T = 5$$ winters?

Let us denote $$z_{i,t} = 1$$ if individual $$i$$ alive at winter $$t$$, and $$z_{i,t} = 2$$ if dead. Then longitudinal data look like in the table below. Each row is an individual $$i$$, and columns are for winters $$t$$, or sampling occasions. Variable $$z$$ is indexed by both $$i$$ and $$t$$, and takes value 1 if individual $$i$$ is alive in winter $$t$$, and 2 otherwise.

## # A tibble: 57 × 6
##       id winter 1 winter 2 winter 3 winter 4
##    <int>      <int>      <int>      <int>      <int>
##  1     1          1          1          1          1
##  2     2          1          1          1          1
##  3     3          1          1          1          1
##  4     4          1          1          1          1
##  5     5          1          1          1          1
##  6     6          1          1          2          2
##  7     7          1          1          1          1
##  8     8          1          2          2          2
##  9     9          1          1          1          1
## 10    10          1          2          2          2
## # ℹ 47 more rows
## # ℹ 1 more variable: winter 5 <int>

## 3.3 A Markov model for longitudinal data

Let’s think of a model for these data. The objective remains the same, estimating survival. To build this model, we’ll make assumptions, go through its components and write down its likelihood. Note that we have already encountered Markov models in Section 1.5.2.

### 3.3.1 Assumptions

First, we assume that the state of an animal in a given winter, alive or dead, is only dependent on its state the winter before. In other words, the future depends only on the present, not the past. This is a Markov process.

Second, if an animal is alive in a given winter, the probability it survives to the next winter is $$\phi$$. The probability it dies is $$1 - \phi$$.

Third, if an animal is dead a winter, it remains dead, unless you believe in zombies.

Our Markov process can be represented this way:

An example of this Markov process is, for example:

Here the animal remains alive over the first two time intervals $$(z_{i,1} = z_{i,2} = z_{i,3} = 1)$$ with probability $$\phi$$ until it dies over the fourth time interval $$(z_{i,4} = 2)$$ with probability $$1-\phi$$ then remains dead from then onwards $$(z_{i,5} = 2)$$ with probability 1.

### 3.3.2 Transition matrix

You might have figured it out already (if not, not a problem), the core of our Markov process is made of transition probabilities between states alive and dead. For example, the probability of transitioning from state alive at $$t-1$$ to state alive at $$t$$ is $$\Pr(z_{i,t} = 1 | z_{i,t-1} = 1) = \gamma_{1,1}$$. It is the survival probability $$\phi$$. The probability of dying over the interval $$(t-1, t)$$ is $$\Pr(z_{i,t} = 2 | z_{i,t-1} = 1) = \gamma_{1,2} = 1 - \phi$$. Now if an animal is dead at $$t-1$$, then $$\Pr(z_t = 1 | z_{t-1} = 2) = 0$$ and $$\Pr(z_{i,t} = 2 | z_{i,t-1} = 2) = 1$$.

We can gather these probabilities of transition between states from one occasion to the next in a matrix, say $$\mathbf{\Gamma}$$, which we will call the transition matrix:

\begin{align*} \mathbf{\Gamma} = \left(\begin{array}{cc} \gamma_{1,1} & \gamma_{1,2}\\ \gamma_{2,1} & \gamma_{2,2} \end{array}\right) = \left(\begin{array}{cc} \phi & 1 - \phi\\ 0 & 1 \end{array}\right) \end{align*}

To try and remember that the states at $$t-1$$ are in rows, and the states at $$t$$ are in columns, I will often write:

$\begin{matrix} & \\ \mathbf{\Gamma} = \left ( \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right . \end{matrix} \hspace{-1.2em} \begin{matrix} z_t=1 & z_t=2 \\ \hdashline \phi & 1-\phi \\ 0 & 1 \end{matrix} \hspace{-0.2em} \begin{matrix} & \\ \left . \vphantom{ \begin{matrix} 12 \\ 12 \end{matrix} } \right ) \begin{matrix} z_{t-1}=1 \; \mbox{(alive)} \\ z_{t-1}=2 \; \mbox{(dead)} \end{matrix} \end{matrix}$

Take the time you need to navigate through this matrix, and get familiar with it. For example, you may start alive at $$t$$ (first row) then end up dead at $$t+1$$ (first column) with probability $$1-\phi$$.

### 3.3.3 Initial states

A Markov process has to start somewhere. We need the probabilities of initial states, i.e. the states of an individual at $$t = 1$$. We will gather the probability of being in each state (alive or 1 and dead or 2) in the first winter in a vector. We will use $$\mathbf{\delta} = \left(\Pr(z_{i,1} = 1), \Pr(z_{i,1} = 2)\right)$$. For simplicity, we will assume that all individuals are marked and released in the first winter, hence alive when first captured, which means that they are all in state alive or 1 for sure. Therefore we have $$\mathbf{\delta} = \left(1, 0\right)$$.

### 3.3.4 Likelihood

Now that we have built a Markov model, we need its likelihood to apply the Bayes theorem. The likelihood is the probability of the data, given the model. Here the data are the $$z$$, therefore we need $$\Pr(\mathbf{z}) = \Pr(z_1, z_2, \ldots, z_{T-2}, z_{T-1}, z_T)$$.

We’re gonna work backward, starting from the last sampling occasion. Using conditional probabilities, the likelihood can be written as the product of the probability of $$z_T$$ i.e. you’re alive or not on the last occasion given your past history, that is the states at previous occasions, times the probability of your past history:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \color{blue}{\Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1)} \\ \end{align*}

Then because we have a Markov model, we’re memory less, that is the probabilty of next state, here $$z_T$$, depends only on the current state, that is $$z_{T-1}$$, and not the previous states:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \color{blue}{\Pr(z_T | z_{T-1})} \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ \end{align*}

You can apply the same reasoning to $$T-1$$. First use conditional probabilities:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \color{blue}{\Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ \end{align*}

Then apply the Markovian property:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\ &= \Pr(z_T | z_{T-1}) \color{blue}{\Pr(z_{T-1} | z_{T-2})} \Pr(z_{T-2}, \ldots, z_1)\\ \end{align*}

And so on up to $$z_2$$. You end up with this expression for the likelihood:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \Pr(z_{T-2}, \ldots, z_1)\\ &= \ldots \\ &= \color{blue}{\Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \ldots \Pr(z_{2} | z_{1}) \Pr(z_{1})}\\ \end{align*}

This is a product of conditional probabilities of states given previous states, and the probability of initial states $$\Pr(z_1)$$. Using a more compact notation for the product of conditional probabilities, we get:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_T | z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1}, z_{T-2},\ldots, z_1) \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}, \ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)\\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \Pr(z_{T-2}, \ldots, z_1)\\ &= \ldots \\ &= \Pr(z_T | z_{T-1}) \Pr(z_{T-1} | z_{T-2}) \ldots \Pr(z_{2} | z_{1}) \Pr(z_{1})\\ &= \color{blue}{\Pr(z_{1}) \prod_{t=2}^T{\Pr(z_{t} | z_{t-1})}}\\ \end{align*}

In the product, you can recognize the transition parameters $$\gamma$$ we defined above, so that the likelihood of a Markov model can be written as:

\begin{align*} \Pr(\mathbf{z}) &= \Pr(z_T, z_{T-1}, z_{T-2}, \ldots, z_1) \color{white}{\Pr(z_{T-1}, z_{T-2},\ldots, z_1) \Pr(z_{T-2}, \ldots, z_1)}\\ &= \Pr(z_{1}) \prod_{t=2}^T{\gamma_{z_{t-1},z_{t}}}\\ \end{align*}

### 3.3.5 Example

I realise these calculations are a bit difficult to follow. Let’s take an example to fix ideas. Let’s assume an animal is alive, alive at time 2 then dies at time 3. We have $$\mathbf{z} = (1, 1, 2)$$. What is the contribution of this animal to the likelihood? Let’s apply the formula we just derived:

\begin{align*} \Pr(\mathbf{z} = (1, 1, 2)) &= \Pr(z_1 = 1) \; \gamma_{z_{1} = 1, z_{2} = 1} \; \gamma_{z_{2} = 1, z_{3} = 2}\\ &= 1 \; \phi \; (1 - \phi). \end{align*}

The probability of having the sequence alive, alive and dead is the probability of being alive first, then to stay alive, eventually to die. The probability of being alive at first occasion being 1, we have that the contribution of this individual to the likelihood is $$\phi (1 - \phi)$$.

## 3.4 Bayesian formulation

Before implementing this model in NIMBLE, we provide a Bayesian formulation of our model. We first note that the likelihood is a product of conditional probabilities of binary events (alive or dead). Usually binary events are associated with the Bernoulli distribution. Here however, we will use its extension to several outcomes (from a coin with two sides to a dice with more than two faces) known as the categorical distribution. The categorical distribution is a multinomial distribution with a single draw. To get a better idea of how the categorical distribution works, let’s simulate from it with the rcat() function. Consider for example a random value drawn from a categorical distribution with probability 0.1, 0.3 and 0.6. Think of a dice with three faces, face 1 has probability 0.1 of occurring, face 2 probability 0.3 and face 3 has probability 0.6, the sum of these probabilities being 1. We expect to get a 3 more often than a 2 and rarely a 1:

rcat(n = 1, prob = c(0.1, 0.3, 0.6))
## [1] 3

Alternatively, you can use the sample() function and sample(x = 1:3, size = 1, replace = FALSE, prob = c(0.1, 0.3, 0.6)). Here is another example in which we sample 20 times in a categorical distribution with probabilities 0.1, 0.1, 0.4, 0.2 and 0.2, hence a dice with 5 faces:

rcat(n = 20, prob = c(0.1, 0.1, 0.4, 0.2, 0.2))
##  [1] 5 3 4 5 4 3 4 2 2 3 1 3 5 5 3 4 2 3 2 3

In this chapter, you will familiarise yourself with the categorical distribution in binary situations, which should make the transition to more states than just alive and dead smoother in the next chapters.

Initial state is a categorical random variable with probability $$\delta$$. That is you have a dice with two faces, or a coin, and you have some probability to be alive, and one minus that probability to be dead. Of course, it you want your Markov chain to start, you’d better say it’s alive so that $$\delta$$ is just $$(1,0)$$:

\begin{align*} z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\ \end{align*}

Now the main part is the dynamic of the states. The state $$z_t$$ at $$t$$ depends only on the known state $$z_{t-1}$$ at $$t-1$$, and is a categorical random variable which probabilities are given by row $$z_{t-1}$$ of the transition matrix $$\mathbf{\Gamma} = \gamma_{z_{t-1},z_{t}}$$:

\begin{align*} z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\ z_t | z_{t-1} &\sim \text{Categorical}(\gamma_{z_{t-1},z_{t}}) &\text{[likelihood, }t > 1 \text{]}\\ \end{align*}

For example, if individual $$i$$ is alive over $$(t-1,t)$$ i.e. $$z_{t-1} = 1$$, we need the first row in $$\mathbf{\Gamma}$$,

\begin{align*} \mathbf{\Gamma} = \left(\begin{array}{cc} \color{blue}{\phi} & \color{blue}{1 - \phi}\\ 0 & 1 \end{array}\right) \end{align*}

that is $$\color{blue}{\gamma_{z_{t-1} = 1,z_{t}} = (\phi, 1-\phi)}$$ and $$z_t | z_{t-1} = 1 \sim \text{Categorical}((\phi, 1-\phi))$$.

Otherwise, if individual $$i$$ dies over $$(t-1,t)$$ i.e. $$z_{t-1} = 2$$, we need the second row in $$\mathbf{\Gamma}$$:

\begin{align*} \mathbf{\Gamma} = \left(\begin{array}{cc} \phi & 1 - \phi\\ \color{blue}{0} & \color{blue}{1} \end{array}\right) \end{align*}

that is $$\color{blue}{\gamma_{z_{t-1} = 2,z_{t}} = (0, 1)}$$ and $$z_t | z_{t-1} = 2 \sim \text{Categorical}((0, 1))$$ (if the individual is dead, it remains dead with probability 1).

We also need a prior on survival. Without surprise, we will use a uniform distribution between 0 and 1, which is also a Beta distribution with parameters 1 and 1. Overall our model is:

\begin{align*} z_1 &\sim \text{Categorical}(\delta) &\text{[likelihood, }t = 1 \text{]}\\ z_t | z_{t-1} &\sim \text{Categorical}(\gamma_{z_{t-1},z_{t}}) &\text{[likelihood, }t > 1 \text{]}\\ \phi &\sim \text{Beta}(1, 1) &\text{[prior for }\phi \text{]} \\ \end{align*}

## 3.5 NIMBLE implementation

How to implement in NIMBLE the Markov model we just built? We need to put in place a few bricks before running our model. Let’s start with the prior on survival, the vector of initial state probabilities and the transition matrix:

markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1] <- 1          # Pr(alive t = 1) = 1
delta[2] <- 0          # Pr(dead t = 1) = 0
gamma[1,1] <- phi      # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi  # Pr(alive t -> dead t+1)
gamma[2,1] <- 0        # Pr(dead t -> alive t+1)
...

Alternatively, you can define vectors and matrices in NIMBLE like you would do it in R. You can write:

markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1:2] <- c(1, 0) # vector of initial state probabilities
gamma[1:2,1:2] <- matrix( c(phi, 0, 1 - phi, 1), nrow = 2) # transition matrix
...

Now there are two important dimensions to our model, along which we need to repeat tasks, namely individual and time. As for time, we describe the successive events of survival using the categorical distribution dcat(), say for individual $$i$$:

z[i,1] ~ dcat(delta[1:2])           # t = 1
z[i,2] ~ dcat(gamma[z[i,1], 1:2])   # t = 2
z[i,3] ~ dcat(gamma[z[i,2], 1:2])   # t = 3
...
z[i,T] ~ dcat(gamma[z[i,T-1], 1:2]) # t = T

There is a more efficient way to write this piece of code by using a for loop, that is a sequence of instructions that we repeat. Here, we condense the previous code into:

z[i,1] ~ dcat(delta[1:2])             # t = 1
for (t in 2:T){ # loop over time t
z[i,t] ~ dcat(gamma[z[i,t-1], 1:2]) # t = 2,...,T
}

Now we just need to do the same for all individuals. We use another loop:

for (i in 1:N){ # loop over individual i
z[i,1] ~ dcat(delta[1:2]) # t = 1
for (j in 2:T){ # loop over time t
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2]) # t = 2,...,T
} # t
} # i

Puting everything together, the NIMBLE code for our Markov model is:

markov.survival <- nimbleCode({
phi ~ dunif(0, 1) # prior
delta[1] <- 1          # Pr(alive t = 1) = 1
delta[2] <- 0          # Pr(dead t = 1) = 0
gamma[1,1] <- phi      # Pr(alive t -> alive t+1)
gamma[1,2] <- 1 - phi  # Pr(alive t -> dead t+1)
gamma[2,1] <- 0        # Pr(dead t -> alive t+1)
# likelihood
for (i in 1:N){ # loop over individual i
z[i,1] ~ dcat(delta[1:2]) # t = 1
for (j in 2:T){ # loop over time t
z[i,j] ~ dcat(gamma[z[i,j-1], 1:2]) # t = 2,...,T
} # t
} # i
})

Note that in this example, $$\delta$$ is used as a placeholder for more complex models we will build in chapters to come. Here, you could simply write z[i,1] <- 1.

Now we’re ready to resume our NIMBLE workflow. First we read in data. Because we have loops and indices that do not change, we use constants as explained in Section 2.3:

my.constants <- list(N = 57, T = 5)
my.data <- list(z = z)

We also specify initial values for survival with a function:

initial.values <- function() list(phi = runif(1,0,1))
initial.values()
## [1] 57
##

### 3.10.3 Compute first, average after

First option is to apply Viterbi to each MCMC sample, then to compute median of the MCMC Viterbi paths for each observed sequence:

niter <- length(p)
T <- 5
res <- matrix(NA, nrow = nrow(y), ncol = T)
for (i in 1:nrow(y)){
res_mcmc <- matrix(NA, nrow = niter, ncol = T)
for (j in 1:niter){
# Initial states
delta <- c(1, 0)
# Transition matrix
transition <- matrix(NA, 2, 2)
transition[1,1] <- phi[j]      # Pr(alive t -> alive t+1)
transition[1,2] <- 1 - phi[j]  # Pr(alive t -> dead t+1)
transition[2,1] <- 0        # Pr(dead t -> alive t+1)
# Observation matrix
emission <- matrix(NA, 2, 2)
emission[1,1] <- 1 - p[j]      # Pr(alive t -> non-detected t)
emission[1,2] <- p[j]          # Pr(alive t -> detected t)
emission[2,1] <- 1          # Pr(dead t -> non-detected t)
emission[2,2] <- 0          # Pr(dead t -> detected t)
res_mcmc[j,1:T] <- getViterbi(emission, transition, delta, y[i,] + 1)
}
res[i, 1:length(y[1,])] <- apply(res_mcmc, 2, median)
}
You can compare the Viterbi decoding to the actual states $$z$$:

Decoding is correct except that the alive actual state is often decoded as the dead state by the Viterbi algorithm. Note that here we compute the Viterbi paths after we run NIMBLE. You could turn the R function getViterbi() into a NIMBLE function and plug it in your model code to apply Viterbi. This would not make any difference except perhaps to increase MCMC computation times.

### 3.10.4 Average first, compute after

Second option is to compute the posterior mean of the observation and transition matrices, then to apply Viterbi:

# Initial states
delta <- c(1, 0)
# Transition matrix
transition <- matrix(NA, 2, 2)
transition[1,1] <- mean(phi)      # Pr(alive t -> alive t+1)
transition[1,2] <- 1 - mean(phi)  # Pr(alive t -> dead t+1)
transition[2,1] <- 0              # Pr(dead t -> alive t+1)
# Observation matrix
emission <- matrix(NA, 2, 2)
emission[1,1] <- 1 - mean(p)      # Pr(alive t -> non-detected t)
emission[1,2] <- mean(p)          # Pr(alive t -> detected t)
emission[2,1] <- 1                # Pr(dead t -> non-detected t)
emission[2,2] <- 0                # Pr(dead t -> detected t)
res <- matrix(NA, nrow = nrow(y), ncol = T)
for (i in 1:nrow(y)){
res[i, 1:length(y[1,]) ] <- getViterbi(emission, transition, delta, y[i,] + 1)
}
Again, you can compare the result of the Viterbi decoding to the actual states we simulated and used to generate the observations:

The results are very similar to those we obtained in Section 3.10.3, and Figure 3.3 is indisguishable from Figure 3.2.

## 3.11 Summary

• A HMM is a model that consists of two parts: i) an unobserved sequence of discrete random variables - the states - satisfying the Markovian property (future states depends on current states only and not on past states) and ii) an observed sequence of discrete random variables - the observations - depending only on the current state.

• The Bayesian approach together with MCMC simulations allow estimating survival and detection probabilities as well as individual latent states alive or dead with the complete likelihood. If you can afford the computation times, then using the complete likelihood is the easiest path for model fitting.

• If you do not need to infer the latent states, you can use the marginal likelihood via the forward algorithm. By avoiding to sample the latent states, you usually get better mixing and faster convergence.

• If you do need to infer the latent states, and you cannot afford the computation times of the complete likelihood, then you can go for the marginal likelihood in conjunction with the Viterbi algorithm to decode the latent states.

• If the computational burden is still an issue, and you have individuals that share the same encounter history, you can use a pooled likelihood to speed up the marginal likelihood evaluation and MCMC convergence.

• The package nimbleEcology is developed by Goldstein et al. (2021) and Ponisio et al. (2020) for application to occupancy and N–mixture models.