Just back from ICCB 2015 in beautiful Montpellier. I really enjoyed it! Lots of interesting talks and engaging discussions, plus I met many old friends and made new ones… what else could one ask from a conference? 🙂
At ICCB, José talked about a paper about species distribution models (SDMs) that we published with several colleagues early this year (Guillera-Arroita et al. 2015, GEB). People seemed to enjoy his talk so I thought I could write a bit about this work in this space too.
In our paper, we looked at the properties of species occurrence data types in terms of their information content about a species distribution, and the implications that this has for different application of SDMs. We looked at presence-background data (only presence records plus information about the environmental conditions in the area), presence-absence data (presence and absence records) and detection data (presence-absence data collected in a way that allows modeling the detection process). Our work provides a synthesis about issues that have been discussed in the literature, which we summarized in this figure:
I am not going to repeat the paper here, but just highlight a few key issues to remember:
- Presence-background data are prone to problems with sampling bias
- Presence-background data at best can only provide a relative measure of occurrence probability (yes, that’s it, one cannot estimate actual species occupancy probability with Maxent, or other PB methods!) *
- Presence-absence data gives estimates of species occurrence probability if detectability is perfect, but not otherwise. This is particularly an issue if detectability varies with environmental covariates (see also this paper)
So the thing is that, depending on the type of data we use for our SDM, we might be estimating one thing or another. Now, the important question is whether this matters for your application. With this in mind, we reviewed a large number of applications that use SDMs from the point of view of the information that they require from the SDM (e.g. does it need information about actual probabilities, or knowledge about relative likelihood of occurrence is fine?). We constructed a (gigantic!) table (in Appendix S3) that we hope can guide SDM users evaluate whether the data they have at hand are suitable for their needs. And we explored five of those applications in detail via simulations.
… oh yes, and we also talked about the widespread practice of reducing SDM outputs to a binary map by applying a threshold. But I’ll write about that some other time!
* Note: There has been some work showing ways to do so (e.g. here), but the methods are very sensitive to mild deviations from parametric assumptions (see an example in Appendix S2). So in practice we think it is best to assume that only a relative estimation is obtained. This makes sense: if absence data are unavailable we do not have information about the prevalence of the species… how could we tell whether a species with few records is rare or whether the sampling was very sparse?