A recent publication in Global Ecology and Biogeography, by Maldonado et al., Estimating species diversity and distribution in the era of Big Data: to what extent can we trust public databases?, brought up some very interesting topics about the quality of specimen data available to researchers.
Researchers compiled 4 datasets with species records with coordinates from the neotropical plant tribe Cinchoneae; one compiled through TROPICOS and individual South American herbaria and the other three from GBIF. The first dataset was carefully validated for taxonomy and geography by researchers. The three GBIF datasets were evaluated, cleaned and improved to varying degrees for comparison.
Using these data, the researchers modeled species distributions and richness at several spatial scales. The results indicated that the unverified GBIF dataset significantly overestimated species diversity and spatial patterns. The researchers found several suspicious records that plotted to the exact center of a country or province. Their models also indicated that just a few misplaced records could severely skew the results.
Maldonado et al. suggested that repositories provide a measure of precision and some information on how the coordinates were ascertained.
This is an excellent suggestion, and one taken very seriously by georeferencers using the MaNIS georeferencing guidelines and data aggregators like Holos, VertNet and its predecessors. Most of these "erroneous" records were probably due to the retrospective georeferencing method of recording latitude and longitude as the geographic center of the most specific administration boundary described when no other locality information is available.
The MaNIS georeferencing guidelines were used to georeference millions of vertebrate natural history records includes in-depth instructions on how to calculate the uncertainty of a georeference. This information is recorded using the Darwin Core field coordinateUncertaintyInMeters. This field is a measurement of the diameter of a circle that would encompass all of spatial possibilities of a georeference given the locality description. You’ll find, the vaguer the description, the greater the uncertainty.
When searching for data through the Ecoengine, make sure to check the coordinate_uncertainty_in_meters field to evaluate the accuracy of the geographic coordinates for your research needs.