I’m going to use this post to rescue a little piece of code that I produced for one of my very first papers, one that I wrote as part of my PhD (Guillera-Arroita et al., 2010, MEE). I find this piece of code quite useful, both for my own work and as a teaching tool. So I thought I should share it in this site, too! This is an R function that runs simulations to assess the performance of the constant occupancy-detection model, given a scenario specified by the user. Very (un)creatively, I decided to call it SODA, for species occupancy model design assistant…
Occupancy-detection models are an extension of logistic regression to account for imperfect detection (MacKenzie et al, 2002; Tyre et al, 2003). Their aim is to estimate species occurrence while accounting for the fact that the species might be missed during surveys at sites it occupies (there are models for false positives too, but I’m not considering those here). Data are needed to describe the detection process, and this is often (although not necessarily) achieved by conducting repeat surveys at the sampling sites.
In SODA, a simulation scenario is defined by the probability of species occupancy (psi), the probability of detecting the species during a survey at a site it occupies (p), the number of replicate visits per site (K) and the total survey budget (E). The latter is expressed in units of repeat survey visits. Increased costs for the first survey visit can also be accommodated with a separate parameter (C1, so that E = S * (K – 1 + C1) where S is the number of sites; if C1 = 1 then E = S*K). Once a scenario is defined, SODA quickly computes estimator bias, precision and accuracy… but, most importantly, it produces cool plots!! … well, at least I think they are cool ;-). The plots are quite intuitive, in that one can immediately get a sense of how biased/imprecise (or not) the estimator is in a given scenario. The dots represent potential outcomes of a study carried out in such conditions. In order words, each dot shows the (maximum-likelihood) estimates for occupancy (y-axis) and detectability (x-axis) obtained when analyzing one of the potential datasets collected given the survey design (number of sampling sites S and replicate visits K) and species characteristics (occupancy psi and detectability p). The color scheme indicates which outcomes are more likely (hot) than others (cold).
SODA’s plots give a sense of how many data are needed to get meaningful results. Looking at them can be eye opening. For instance, the plots below how K=2 surveys at S=50 sites lead to decent performance when detectability and occupancy are high (left), but how this same amount of survey effort is unlikely to be sufficient for rare and elusive species (right). In this case, the estimates have great spread and obtaining boundary estimates (psi=1) is quite possible.

SODA’s plots can also help assess trade-offs. For instance, look how increasing the number of replicates is more efficient than increasing the number of sites in the latter scenario:

Unarguably SODA’s simulations are simplistic (constant model, model assumptions perfectly met, same effort applied to all sites), yet SODA can be a handy tool to assist in survey design. It quickly provides a basic understanding of whether a given survey effort is appropriate or not in “ideal” conditions. So it sets an important lower bound for survey effort requirements. I trust someone out there will find SODA useful too!