PhD opportunity (ecological modelling and monitoring)

I’m seeking applications from highly motivated candidates interested in conducting PhD research on wildlife monitoring or species distribution modelling, particularly from a methodological angle. Work could involve understanding how existing modelling tools work, evaluating how they perform under different circumstances, and developing extensions and guidelines for their use.

As the research will develop at the interface between ecology and statistics, desirable candidates include ecologists with skills in statistical modelling, as well as statisticians and mathematicians with strong interest in ecological applications. A successful candidate would ideally start early 2017, and will be co-supervised by other QAECO principal investigators. The successful candidate must secure a scholarship through the University of Melbourne (APA or MRS).

To apply: send a one-page statement outlining your research interests and ideas, together with a CV, academic transcript and contact details for 2 academic references to gguillera[at]unimelb.edu.au. Candidates should contact to discuss details about this opportunity at least several weeks prior to deadline for applications (28 October; or 30 September for international applicants).

Full advertisement here

banner

Posted in Uncategorized | Leave a comment

Talk at the International Biometric Conference

Last week, the 2016 International Biometric Conference was held in Victoria, BC (Canada). As recipients of the ‘2014 best JABES paper’ award, me and my co-authors Byron Morgan, and Martin Ridout had been invited to present our work at the conference. Unfortunately, this time none of us could make it to the conference in person, which is a pity, because judging from the program there were lots of interesting sessions. Yet, the organisers gave us the chance to present in video format. And this is the result! :o)

Posted in Uncategorized | Leave a comment

Modelling course in Spain

Yesterday José Lahoz, Marc Kéry and I finished teaching our 5-day course on Modelling the distribution of species and communities accounting for detection using R and BUGS/JAGS. The course was hosted by the Population Ecology Group of the Mediterranean Institute for Advanced Studies (IMEDEA) in Esporles, Mallorca (Spain)… a really beautiful place!  

some_of_us_cr.jpg

with Marc and José

We had a fantastic and varied group of attendants with a range of backgrounds. They were from 12 different countries (spain, portugal, france, uk, netherlands, italy, germany, switzerland, greece, brazil, estonia and canada) and came with plenty of interesting questions and ideas for discussion, so I really enjoyed the week!

group_cr.jpg

The group

In the course, we first reviewed the bases of statistical inference (maximum likelihood and Bayesian). We then discussed and applied the occupancy-detection modelling framework for modelling species distribution patterns, range dynamics and communities. We had a bit of time dedicated to pracs, and Stefano Canessa assisted us with those, which was really helpful.

class1.JPG

In the class…

The course was intensive and we worked hard… but we also had time to enjoy. We made new friends and enjoyed the Mallorcan cuisine!

dinner1.JPG

… and having fun!

Posted in Uncategorized | Leave a comment

Is my SDM fit for purpose?

Just back from ICCB 2015 in beautiful Montpellier. I really enjoyed it! Lots of interesting talks and engaging discussions, plus I met many old friends and made new ones… what else could one ask from a conference? 🙂

At ICCB, José talked about a paper about species distribution models (SDMs) that we published with several colleagues early this year (Guillera-Arroita et al. 2015, GEB). People seemed to enjoy his talk so I thought I could write a bit about this work in this space too.

In our paper, we looked at the properties of species occurrence data types in terms of their information content about a species distribution, and the implications that this has for different application of SDMs. We looked at presence-background data (only presence records plus information about the environmental conditions in the area), presence-absence data (presence and absence records) and detection data (presence-absence data collected in a way that allows modeling the detection process). Our work provides a synthesis about issues that have been discussed in the literature, which we summarized in this figure:

PowerPoint Presentation

The figure shows the different quantities that an SDM can estimate depending on the type of data and conditions. The dark arrows indicate what can be estimated by default with a given type of data. All this assuming many other things, such as good predictors, model structure, sample size, etc, etc…. By the way, psi = occupancy and p* = cumulative detectability.

I am not going to repeat the paper here, but just highlight a few key issues to remember:

  • Presence-background data are prone to problems with sampling bias
  • Presence-background data at best can only provide a relative measure of occurrence probability (yes, that’s it, one cannot estimate actual species occupancy probability with Maxent, or other PB methods!) *
  • Presence-absence data gives estimates of species occurrence probability if detectability is perfect, but not otherwise. This is particularly an issue if detectability varies with environmental covariates (see also this paper)

So the thing is that, depending on the type of data we use for our SDM, we might be estimating one thing or another. Now, the important question is whether this matters for your application. With this in mind, we reviewed a large number of applications that use SDMs from the point of view of the information that they require from the SDM (e.g. does it need information about actual probabilities, or knowledge about relative likelihood of occurrence is fine?). We constructed a (gigantic!) table (in Appendix S3) that we hope can guide SDM users evaluate whether the data they have at hand are suitable for their needs. And we explored five of those applications in detail via simulations.

… oh yes, and we also talked about the widespread practice of reducing SDM outputs to a binary map by applying a threshold. But I’ll write about that some other time!

G

* Note: There has been some work showing ways to do so (e.g. here), but the methods are very sensitive to mild deviations from parametric assumptions (see an example in Appendix S2). So in practice we think it is best to assume that only a relative estimation is obtained. This makes sense: if absence data are unavailable we do not have information about the prevalence of the species… how could we tell whether a species with few records is rare or whether the sampling was very sparse?

Posted in Uncategorized | 2 Comments

SODA: Occupancy-detection simulations

I’m going to use this post to rescue a little piece of code that I produced for one of my very first papers, one that I wrote as part of my PhD (Guillera-Arroita et al., 2010, MEE). I find this piece of code quite useful, both for my own work and as a teaching tool. So I thought I should share it in this site, too! This is an R function that runs simulations to assess the performance of the constant occupancy-detection model, given a scenario specified by the user. Very (un)creatively, I decided to call it SODA, for species occupancy model design assistant…

Occupancy-detection models are an extension of logistic regression to account for imperfect detection (MacKenzie et al, 2002; Tyre et al, 2003). Their aim is to estimate species occurrence while accounting for the fact that the species might be missed during surveys at sites it occupies (there are models for false positives too, but I’m not considering those here). Data are needed to describe the detection process, and this is often (although not necessarily) achieved by conducting repeat surveys at the sampling sites.

In SODA, a simulation scenario is defined by the probability of species occupancy (psi), the probability of detecting the species during a survey at a site it occupies (p), the number of replicate visits per site (K) and the total survey budget (E). The latter is expressed in units of repeat survey visits. Increased costs for the first survey visit can also be accommodated with a separate parameter (C1, so that E = S * (K – 1 + C1) where S is the number of sites; if C1 = 1 then E = S*K). Once a scenario is defined, SODA quickly computes estimator bias, precision and accuracy… but, most importantly, it produces cool plots!! … well, at least I think they are cool ;-). The plots are quite intuitive, in that one can immediately get a sense of how biased/imprecise (or not) the estimator is in a given scenario. The dots represent potential outcomes of a study carried out in such conditions. In order words, each dot shows the (maximum-likelihood) estimates for occupancy (y-axis) and detectability (x-axis) obtained when analyzing one of the potential datasets collected given the survey design (number of sampling sites S and replicate visits K) and species characteristics (occupancy psi and detectability p). The color scheme indicates which outcomes are more likely (hot) than others (cold).

SODA’s plots give a sense of how many data are needed to get meaningful results. Looking at them can be eye opening. For instance, the plots below how K=2 surveys at S=50 sites lead to decent performance when detectability and occupancy are high (left), but how this same amount of survey effort is unlikely to be sufficient for rare and elusive species (right). In this case, the estimates have great spread and obtaining boundary estimates (psi=1) is quite possible.

Rplot

SODA’s plots can also help assess trade-offs. For instance, look how increasing the number of replicates is more efficient than increasing the number of sites in the latter scenario:

Rplot

Unarguably SODA’s simulations are simplistic (constant model, model assumptions perfectly met, same effort applied to all sites), yet SODA can be a handy tool to assist in survey design. It quickly provides a basic understanding of whether a given survey effort is appropriate or not in “ideal” conditions. So it sets an important lower bound for survey effort requirements. I trust someone out there will find SODA useful too!

Posted in Uncategorized | 1 Comment

Accounting for detectability is not a waste – a response to Welsh et al

Last year, controversy sparked in the occupancy-detection literature. Alan Welsh, David Lindenmayer and Christine Donnelly (WLD hereafter) published a paper in PLoS ONE [1] that heavily criticized the value of accounting for detectability in species occupancy studies. In a nutshell, WLD compared the basic occupancy-detection model [2] to a model that disregards detectability. They concluded that ignoring and accounting for non-detection lead to the same biases and that “ignoring detection can actually be better than trying to adjust for it”. They also claimed that occupancy-detection models are difficult to fit because they “have multiple solutions, including boundary estimates which produce fitted probability of zero or one”, and that “estimates are unstable when the data are sparse” and “highly variable”. They wrote that their work “undermines the rationale for occupancy modelling”:

“It shows that when detection depends on abundance, ignoring non detection can actually be better than trying to adjust for it, so the extra data collection and modelling effort to try to adjust for non-detection is simply not worthwhile.”

The paper was picked up in a blog post by Brian McGill, who largely echoed WLD’s views:

“Bottom line – ignoring detection issues often gives misleading/wrong answers. But at exactly the same rate as if you were modelling detection which also often gives misleading/wrong answers. When you combine this with the real world fact that often times only half or one third of the data (by which I mean independent observations) is collected that would have been collected if we ignored detection probabilities, one really starts to question the appropriateness of demanding detection probabilities.”

However, many people did not agree with WLD’s paper and several experts expressed why they thought WLD’s conclusions were not well founded. All in all, this created lots of discussion around that blog entry (+100 comments to date). I’m not going to summarize the different views here; those interested can look at the discussion thread.

I have to admit it, when I first came across the WLD paper and read its abstract, I was confused. Most of my research to date had been grounded on the idea that detectability was something worth accounting for. For what I knew, detectability was an important issue in wildlife surveys. I had read lots of interesting work by clever people showing so. I had followed the many developments in occupancy-detection models presented over the last decade. Plus, I knew that most modern statistical ecology methods were indeed state-space models, that is, models that explicitly describe the observation process [3]. So, if WLD were right and accounting for detectability was a waste… how could it be that so many people had been so wrong and for so long? What had WLD found out that everyone else had missed?

In Brian’s blog there were complaints that no discussant was really getting into the WLD paper and addressing it in detail. So I decided to take up the challenge, read the paper with care and evaluate whether the evidence WLD provided was convincing or not. The result of this work is a response paper that I have published [4], also in PLoS ONE, together with co-authors Jose Lahoz-Monfort, Brendan Wintle, Michael McCarthy and Darryl MacKenzie.

Reading WLD’s paper was hard, it contains a lot of material; getting to the bottom of things took some time and effort. However, once we did, it became clear that there were several deficiencies in the WLD paper. There was a lot to be said but, for the sake of getting the main ideas across, we decided to concentrate on 5 key messages. I summarize these below, and you can read the details in our paper.

  • Message 1: Boundary estimates and multiple solutions are not as great a problem as implied by WLD. Ok, there is no doubt that estimator performance worsens as sample size decreases. Some of these problems include multiple solutions and boundary estimates. This is a fact and it applies to any statistical method; therefore, by itself this cannot be used as an argument to dismiss the value of occupancy-detection models. On top of that, WLD’s paper overstates the magnitude of these issues. For instance, we explain why some of the boundary estimates that WLD found (zeros) are mathematically not possible (i.e. WLD must have had a problem with their model fitting procedure). We also show that, for the examples tested by WLD, the R software package we used (unmarked) finds the maximum-likelihood estimates at the first attempt in most cases. Having said that, we recommend model fitting with multiple different starting values. But this is just good practice in general, for occupancy models, capture-recapture, distance sampling or whenever using numerical maximum-likelihood methods.
  • Message 2: By ignoring imperfect detection, a different metric is estimated. This metric can be derived from the occupancy-detection model. When detectability is disregarded, occupancy and detection are confounded: rather than estimating where the species is, we estimate where the species can be detected. This same metric can be also derived from occupancy-detection models (as the product of the estimated occupancy probability and the estimated conditional probability of detection given the survey effort expended). So, if one really thinks that this is the metric that is going to solve one’s problem, no worries, the occupancy-detection model also delivers it. 🙂
  • Message 3: Accounting for imperfect detection provides a more reliable estimator of occupancy, which honestly captures the uncertainty. WLD support their claims by showing an example where the naïve model that disregards detectability has a lower root mean square error (RMSE) than the occupancy-detection model that accounts for it. We agree that occupancy-detection estimates are more imprecise that naïve estimates; this is simply because the model explicitly recognizes an additional source of uncertainty (imperfect detection). So, yes, when the sample size is small, in some scenarios the naïve model, even if biased, might have smaller RMSE than the occupancy-detection model. However, one can never tell whether that is the case. On the contrary, because occupancy and detection are confounded, one might obtain the same naïve estimates but be in a situation where the model is very biased and its RMSE much worse. Our argument here: it is better to be aware of one’s uncertainty, than to trust a precise result that might be very wrong. The occupancy-detection model represents honestly the uncertainty about the estimates; being honest about one’s uncertainty is fundamental when basing decisions on the estimates obtained.
  • Message 4: Accounting for imperfect detection does not imply a need for increased sampling effort; imperfect detection does. There seems to be quite a bit of confusion about this. Accounting for detectability does not necessarily imply increased sampling effort; what one needs is to collect data in a way that is informative about detectability so the detection process can be modelled explicitly (e.g. repeat visits, times to detection, separate records from independent observers, etc…). Of course, the lower the detectability, the larger the sampling effort needed for a particular level of precision. But this is not a requirement of the model: low detectability drives this need. Related to this, it is important to note that, while WLD complained that the occupancy-detection model requires more effort, in all their comparisons with the naïve model they used the full survey effort (i.e. they combined all detections from the separate visits to each site to fit the naïve model). Hence WLD’s comparisons are not consistent with their complaints: if WLD chose to associate the additional replicate surveys as a complication of the occupancy-detection model, then they should have used data from only one visit to fit the naïve model; otherwise the amount of survey effort should not be used as an argument against the use of occupancy-detection models.
  • Message 5: Occupancy-detection models are less biased than naïve models even if detection is a function of abundance. WLD considered one scenario where detectability was heterogeneous among sites. In this case, the bias when estimating occupancy was essentially the same regardless of whether detectability was accounted for or not. They presented this result (that biases were the same in both models) as their key result. In our paper we show that this is not a general result; it only applies when heterogeneity is extreme. The scenario that WLD considered was indeed extreme; studying the beta distributions they used to model heterogeneity we see that, of occupied sites with the same characteristics (categories 1 and 2), some had practically perfect detectability after two visits, while in others there was nearly no chance of detecting the species; virtually no sites with intermediate probabilities of detection existed in their scenario. Ecologically, this does not seem like a very realistic case. We show that, for more modest yet substantial levels of heterogeneity, the occupancy-detection model is indeed less biased than the naive model.

In summary, the analysis presented by WLD does not justify their general claims. Our reply shows that considering detectability during study design, data collection, analysis and interpretation is worthwhile. More generally, it is fundamental to make sure that the sampling process does not mislead inference about the biological process of interest. Having said that, we do agree that in some cases detectability might not be a big issue. For instance, if detectability is constant, then inference about spatial and temporal trends may be reliable even if detectability is not accounted for. However, it all depends on whether that assumption is reasonable, and often that might not be known a priori. Hence, the safest approach is to collect data in a way that allows detectability to be accounted for.

Brian McGill forecast:

“I am sure the Welsh paper will be attacked with blazing guns. That’s what vested interests do.”

Well, we have indeed responded to Welsh paper, but with statistical arguments instead of guns. And our interests are purely scientific 🙂 I thought it was important to clarify things. The recommendations by WLD seemed quite worrying and dangerous for the quality of statistical inference in ecology.

——————————

References

[1] Welsh AH, Lindenmayer DB, Donnelly CF (2013) Fitting and Interpreting Occupancy Models. PLoS ONE 8: e52015.

[2] MacKenzie DI, Nichols JD, Lachman GB, Droege S, Royle JA, et al. (2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology 83: 2248-2255.

[3] King R (2014) Statistical Ecology. Annual Review of Statistics and its Application: 401-426.

[4] Guillera-Arroita G, Lahoz-Monfort JJ, MacKenzie DI, Wintle BA, McCarthy MA (2014) Ignoring imperfect detection in biological surveys is dangerous: a response to ‘Fitting and Interpreting Occupancy Models’. PLoS ONE 9: e99571.

 

Posted in Uncategorized | 2 Comments

New paper: Bayesian and sequential design of occupancy studies

We all know that designing studies carefully is very good practice. However, survey design rules and recommendations often depend on the “true” value of the parameters, which are by definition unknown prior to the study. Take the example of studies that aim to estimate species occupancy probability while accounting for imperfect detection. Survey effort can be split in different ways: one can survey more sites with less effort or fewer sites applying more effort to each of them. There are rules in the literature regarding how to choose the optimal amount of replication in such studies (references [2-4] below) but these depend on the values of occupancy and detectability themselves. So what can we do when we know little about our system prior to collecting data?

Byron Morgan, Martin Ridout and I just published a new paper (Guillera-Arroita et al, JABES, doi = 10.1007/s13253-014-0171-4) where we use Bayesian and Sequential design techniques to address this problem. To be honest, these are just two fancy names for two common sense approaches that handle explicitly the uncertainty in initial parameter values. In a nutshell:

  • Bayesian design = rather than considering a single value for the assumed parameters to guide our design, this method uses a distribution (prior) to reflect our uncertainty about them.
  • Sequential design = rather than designing the whole study and then collecting data, this method approaches the problem in stages, i.e. choose a first design, collect part of the data, analyse them and then reconsider the design for the rest of the study.

Leaving aside the technical details of our evaluation, our paper has a very clear take home message: we show how a simple 2-stage design can significantly improve the performance of occupancy studies. For the scenarios we explored, we found that dedicating about 30-50% effort to the first stage of the study and then readjusting its design was a good strategy. So for those planning to carry out a new study, our recommendation to maximise its efficiency: try to keep your study adaptive and reconsider its design once you get a better sense about the occupancy and detectability levels that you are working with. Simple!

——————————

[1] Guillera-Arroita, G., Ridout, M.S. & Morgan, B.J.T. (2014) Two-stage Bayesian study design for species occupancy estimation. Journal of Agricultural, Biological and Environmental Statistics, doi: 10.1007/s13253-014-0171-4.

[2] MacKenzie, D.I. & Royle, J.A. (2005) Designing occupancy studies: general advice and allocating survey effort. Journal of Applied Ecology, 42, 1105-1114.

[3] Guillera-Arroita, G., Ridout, M.S. & Morgan, B.J.T. (2010) Design of occupancy studies with imperfect detection. Methods in Ecology and Evolution, 1, 131-139.

[4] Guillera-Arroita, G. & Lahoz-Monfort, J.J. (2012) Designing studies to detect differences in species occupancy: power analysis under imperfect detection. Methods in Ecology and Evolution, 3, 860-869.

 

Posted in Uncategorized | Tagged | 1 Comment