Last year, controversy sparked in the occupancy-detection literature. Alan Welsh, David Lindenmayer and Christine Donnelly (WLD hereafter) published a paper in PLoS ONE  that heavily criticized the value of accounting for detectability in species occupancy studies. In a nutshell, WLD compared the basic occupancy-detection model  to a model that disregards detectability. They concluded that ignoring and accounting for non-detection lead to the same biases and that “ignoring detection can actually be better than trying to adjust for it”. They also claimed that occupancy-detection models are difficult to fit because they “have multiple solutions, including boundary estimates which produce fitted probability of zero or one”, and that “estimates are unstable when the data are sparse” and “highly variable”. They wrote that their work “undermines the rationale for occupancy modelling”:
“It shows that when detection depends on abundance, ignoring non detection can actually be better than trying to adjust for it, so the extra data collection and modelling effort to try to adjust for non-detection is simply not worthwhile.”
The paper was picked up in a blog post by Brian McGill, who largely echoed WLD’s views:
“Bottom line – ignoring detection issues often gives misleading/wrong answers. But at exactly the same rate as if you were modelling detection which also often gives misleading/wrong answers. When you combine this with the real world fact that often times only half or one third of the data (by which I mean independent observations) is collected that would have been collected if we ignored detection probabilities, one really starts to question the appropriateness of demanding detection probabilities.”
However, many people did not agree with WLD’s paper and several experts expressed why they thought WLD’s conclusions were not well founded. All in all, this created lots of discussion around that blog entry (+100 comments to date). I’m not going to summarize the different views here; those interested can look at the discussion thread.
I have to admit it, when I first came across the WLD paper and read its abstract, I was confused. Most of my research to date had been grounded on the idea that detectability was something worth accounting for. For what I knew, detectability was an important issue in wildlife surveys. I had read lots of interesting work by clever people showing so. I had followed the many developments in occupancy-detection models presented over the last decade. Plus, I knew that most modern statistical ecology methods were indeed state-space models, that is, models that explicitly describe the observation process . So, if WLD were right and accounting for detectability was a waste… how could it be that so many people had been so wrong and for so long? What had WLD found out that everyone else had missed?
In Brian’s blog there were complaints that no discussant was really getting into the WLD paper and addressing it in detail. So I decided to take up the challenge, read the paper with care and evaluate whether the evidence WLD provided was convincing or not. The result of this work is a response paper that I have published , also in PLoS ONE, together with co-authors Jose Lahoz-Monfort, Brendan Wintle, Michael McCarthy and Darryl MacKenzie.
Reading WLD’s paper was hard, it contains a lot of material; getting to the bottom of things took some time and effort. However, once we did, it became clear that there were several deficiencies in the WLD paper. There was a lot to be said but, for the sake of getting the main ideas across, we decided to concentrate on 5 key messages. I summarize these below, and you can read the details in our paper.
- Message 1: Boundary estimates and multiple solutions are not as great a problem as implied by WLD. Ok, there is no doubt that estimator performance worsens as sample size decreases. Some of these problems include multiple solutions and boundary estimates. This is a fact and it applies to any statistical method; therefore, by itself this cannot be used as an argument to dismiss the value of occupancy-detection models. On top of that, WLD’s paper overstates the magnitude of these issues. For instance, we explain why some of the boundary estimates that WLD found (zeros) are mathematically not possible (i.e. WLD must have had a problem with their model fitting procedure). We also show that, for the examples tested by WLD, the R software package we used (unmarked) finds the maximum-likelihood estimates at the first attempt in most cases. Having said that, we recommend model fitting with multiple different starting values. But this is just good practice in general, for occupancy models, capture-recapture, distance sampling or whenever using numerical maximum-likelihood methods.
- Message 2: By ignoring imperfect detection, a different metric is estimated. This metric can be derived from the occupancy-detection model. When detectability is disregarded, occupancy and detection are confounded: rather than estimating where the species is, we estimate where the species can be detected. This same metric can be also derived from occupancy-detection models (as the product of the estimated occupancy probability and the estimated conditional probability of detection given the survey effort expended). So, if one really thinks that this is the metric that is going to solve one’s problem, no worries, the occupancy-detection model also delivers it. 🙂
- Message 3: Accounting for imperfect detection provides a more reliable estimator of occupancy, which honestly captures the uncertainty. WLD support their claims by showing an example where the naïve model that disregards detectability has a lower root mean square error (RMSE) than the occupancy-detection model that accounts for it. We agree that occupancy-detection estimates are more imprecise that naïve estimates; this is simply because the model explicitly recognizes an additional source of uncertainty (imperfect detection). So, yes, when the sample size is small, in some scenarios the naïve model, even if biased, might have smaller RMSE than the occupancy-detection model. However, one can never tell whether that is the case. On the contrary, because occupancy and detection are confounded, one might obtain the same naïve estimates but be in a situation where the model is very biased and its RMSE much worse. Our argument here: it is better to be aware of one’s uncertainty, than to trust a precise result that might be very wrong. The occupancy-detection model represents honestly the uncertainty about the estimates; being honest about one’s uncertainty is fundamental when basing decisions on the estimates obtained.
- Message 4: Accounting for imperfect detection does not imply a need for increased sampling effort; imperfect detection does. There seems to be quite a bit of confusion about this. Accounting for detectability does not necessarily imply increased sampling effort; what one needs is to collect data in a way that is informative about detectability so the detection process can be modelled explicitly (e.g. repeat visits, times to detection, separate records from independent observers, etc…). Of course, the lower the detectability, the larger the sampling effort needed for a particular level of precision. But this is not a requirement of the model: low detectability drives this need. Related to this, it is important to note that, while WLD complained that the occupancy-detection model requires more effort, in all their comparisons with the naïve model they used the full survey effort (i.e. they combined all detections from the separate visits to each site to fit the naïve model). Hence WLD’s comparisons are not consistent with their complaints: if WLD chose to associate the additional replicate surveys as a complication of the occupancy-detection model, then they should have used data from only one visit to fit the naïve model; otherwise the amount of survey effort should not be used as an argument against the use of occupancy-detection models.
- Message 5: Occupancy-detection models are less biased than naïve models even if detection is a function of abundance. WLD considered one scenario where detectability was heterogeneous among sites. In this case, the bias when estimating occupancy was essentially the same regardless of whether detectability was accounted for or not. They presented this result (that biases were the same in both models) as their key result. In our paper we show that this is not a general result; it only applies when heterogeneity is extreme. The scenario that WLD considered was indeed extreme; studying the beta distributions they used to model heterogeneity we see that, of occupied sites with the same characteristics (categories 1 and 2), some had practically perfect detectability after two visits, while in others there was nearly no chance of detecting the species; virtually no sites with intermediate probabilities of detection existed in their scenario. Ecologically, this does not seem like a very realistic case. We show that, for more modest yet substantial levels of heterogeneity, the occupancy-detection model is indeed less biased than the naive model.
In summary, the analysis presented by WLD does not justify their general claims. Our reply shows that considering detectability during study design, data collection, analysis and interpretation is worthwhile. More generally, it is fundamental to make sure that the sampling process does not mislead inference about the biological process of interest. Having said that, we do agree that in some cases detectability might not be a big issue. For instance, if detectability is constant, then inference about spatial and temporal trends may be reliable even if detectability is not accounted for. However, it all depends on whether that assumption is reasonable, and often that might not be known a priori. Hence, the safest approach is to collect data in a way that allows detectability to be accounted for.
Brian McGill forecast:
“I am sure the Welsh paper will be attacked with blazing guns. That’s what vested interests do.”
Well, we have indeed responded to Welsh paper, but with statistical arguments instead of guns. And our interests are purely scientific 🙂 I thought it was important to clarify things. The recommendations by WLD seemed quite worrying and dangerous for the quality of statistical inference in ecology.
 Welsh AH, Lindenmayer DB, Donnelly CF (2013) Fitting and Interpreting Occupancy Models. PLoS ONE 8: e52015.
 MacKenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, et al. (2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology 83: 2248-2255.
 King R (2014) Statistical Ecology. Annual Review of Statistics and its Application: 401-426.
 Guillera-Arroita G, Lahoz-Monfort JJ, MacKenzie DI, Wintle BA, McCarthy MA (2014) Ignoring imperfect detection in biological surveys is dangerous: a response to ‘Fitting and Interpreting Occupancy Models’. PLoS ONE 9: e99571.