Please use this identifier to cite or link to this item: http://hdl.handle.net/2381/39769
Title: A quantitative analysis of global gazetteers: Patterns of coverage for common feature types
Authors: Acheson, Elise
De Sabbata, Stefano
Purves, Ross
First Published: 13-Apr-2017
Publisher: Elsevier
Citation: Computers, Environment and Urban Systems, 2017, 64, pp. 309-320
Abstract: Gazetteers are important tools used in a wide variety of workflows that depend on linking natural language text to geographical space. The spatial properties of these data sources, such as coverage, balance, and completeness, affect the performance of common tasks such as geoparsing and geocoding. However, little attention has focused on how these properties vary in global gazetteers, particularly across country boundaries and according to feature types. In this paper, we present a detailed investigation of the spatial properties of two open gazetteers with worldwide coverage: GeoNames, and the Getty Thesaurus of Geographic Names (TGN). Using point density maps, correlations, and linear regressions, we analyze the global spatial coverage of each data source for the full set of features and for top feature types: populated places, streams, mountains, and hills. Results show wide discrepancies in coverage between the two datasets, sharp changes in feature type coverage across country borders, and idiosyncratic patterns dominated by a few countries for the more sparsely covered natural features. As more and more systems rely on recognizing and grounding named places, these patterns can influence the analysis of growing amounts of online text content and reinforce or amplify existing inequalities.
DOI Link: 10.1016/j.compenvurbsys.2017.03.007
ISSN: 0198-9715
Links: http://www.sciencedirect.com/science/article/pii/S0198971516302496
http://hdl.handle.net/2381/39769
Version: Publisher Version
Status: Peer-reviewed
Type: Journal Article
Rights: Copyright © the authors, 2017. This article is distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ ), which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
Appears in Collections:Published Articles, Dept. of Geography

Files in This Item:
File Description SizeFormat 
1-s2.0-S0198971516302496-main.pdfPublished (publisher PDF)2.53 MBAdobe PDFView/Open


Items in LRA are protected by copyright, with all rights reserved, unless otherwise indicated.