Improving OpenStreetMap Data Quality with Atlas Checks

The OpenStreetMap (OSM) project, a free, editable map of the world, has been widely used by organizations, NGOs, humanitarian groups, and the scientific research community as a data provider for turn-by-turn navigation, disaster and pandemic response, education, and more. The data is community-driven, so any registered user can add, update, or delete map data. As a result, OSM end-users are consistently looking to improve the quality and integrity of the basemap. The open-source community has developed several quality assurance tools (Osmose, OSMCha, KeepRight) to ensure new edits withhold the highest standards. In this post, I’ll review the open-source quality assurance tool Atlas Checks.

What are Atlas Checks?

The Atlas Checks framework provides quality assurance tools for programmatically detecting OSM mapping errors. Each Atlas Check is an algorithm that defines and flags map errors with individual instructions describing the problem and how to fix them.

“An Atlas Check is simply an algorithm that you can write that will find and flag issues in the OSM Basemap data through the use of Atlas, an in memory graph that represents the underlying OSM Basemap data.” Atlas Checks Guide

As of June 2020, there are a total of 63 available Atlas Checks that detect map errors! The following categories broadly classify each check:

  1. Geometry checks
  2. Tag checks

Geometry-based Checks flag map errors related to the shape or construction of OSM features. Polylines that cross buildings, features that self intersect, duplicate geometries, and roads that cannot be entered or exited are great examples — all of which can present navigation problems. The FloatingEdgeCheck flags such streets that are disconnected from road networks, making them non-navigable. RoundaboutConnectorCheck is another geometry related check that flags roads connected to roundabouts with extremely sharp angles, making it impossible to turn.

Tag-based checks identify errors related to OSM tags, which are key-value attributes used to describe features of the core OSM elements. For example, an OSM Way tagged highway=residential describes thehighway classification. Tag-based checks can identify features with missing tags, invalid tag combinations, or invalid tag values based on OSM Wiki or TagInfo. For example, HighwayToFerryTagCheck flags OSM Ways which invalidly contain both highway=* and route=ferry tags. The OSM Wiki states that all route=ferry tags must also include a ferry tag. The value of the ferry tag is similar to highway types like primarysecondary, etc. For example, the Way in Figure 1 has a route=ferry tag and a highway=path tag but is missing a ferry tag. Hence this Way will be flagged. The check’s instruction directs the editor to remove the highway tag and replace it with the ferry=path tag.

Figure 1: OSM Way (339440691) with route=ferry tag and missing a ferry=* Tag.

Data Model Comparison: Atlas vs. OSM

The Atlas project provides helpful tools (country-slicing, way-sectioning, tag filtering, etc.) for efficiently representing OSM data in-memory. Although the data model is a graph network optimized for routing, it stays true to the core OSM elements:

OSM vs. Atlas Data Model.

  • Nodes — point features stored as coordinates
  • Ways — ordered lists of Nodes forming open or closed lines
  • Relations — a collection of Nodes and Ways as members
  • Tags — represent key-value pairs as descriptors

Atlas converts navigable OSM Ways and Nodes to Edges (e.g., roads, ferry routes) and Nodes (start/end of Edges and connections). All non-navigable features in OSM are modeled as Atlas Lines (e.g., power lines), Areas (e.g., parks, lakes), Points (e.g., trees, monuments), or Relations (e.g., bus lines). In addition to this, all the tags and geometries from OSM remain intact in Atlas.

Atlas Checks Data Flow

The Atlas Checks framework provides an Apache Spark job that allows users to execute checks on a cluster. The input Atlas data provides partitioned country files (or shards) to process in parallel. Each Spark executor loads a set of Atlas shards in-memory, executes a configurable list of checks, and writes the output data to the provided file system (e.g., HDFS, S3, Cloud Storage).

Figure 2: Atlas Checks Data Flow.

Figure 2 represents the flow of data through the Atlas Checks framework. The Planet OSM data in the form of PBF files are first converted into Atlas files using the Atlas API. This conversion includes operations like country slicing, way sectioning, sharding, etc. (check out the Atlas project to learn more about these processes).

Once the PBF to Atlas generation is complete, we can use these files as the input to the next step. The Atlas Checks Spark job outputs line-delimited GeoJSON files for easy integration with editing tools like MapRoulette and JOSM. Optionally, the Atlas Checks project provides tools to upload data to Postgres, allowing users to leverage the power of SQL and PostGIS for spatial queries and analysis.

Check Development

Atlas Checks are written in Java and require the following tools for development and usage:

An Atlas Check’s source code can be split into three sections:

  1. Initialize
  2. Validate
  3. Generate

I’ll explain each of these using the HighwayToFerryTagCheck as an example. As mentioned previously, the requirement of the check is to flag Edges with both a highway and route=ferry tag.

1. Initialize

Each check is initialized as the configuration is passed through the constructor. Here, class variables are set to be used throughout the check. In this case, the value of minimumHighwayType is derived from the configuration.

The Spark Job is powered by a configuration file that allows users to apply and tweak global and individual settings for each run. These settings include OSM tag filters, MultiPolygon in and exclude filters, MapRoulette API configurations, and more. For more details on the checks config, check out the documentation.

2. Validate

During this stage, Atlas objects that don’t meet the conditions defined in the validCheckForObject function are filtered out. In our example, only Edges with a route=ferry tag and a highway classification greater than this.minimumHighwayType will be valid for the check.

3. Generate

During the Generate stage, Atlas objects that pass the Validate stage are further evaluated for map errors in the flag() method. Objects meeting the conditions of the flag method are flagged and given appropriate instructions. Once the check has completed, a line delimited GeoJSON is generated where each line represents a Flag object. Each flag contains information about its respective check, an editing instruction, and attribute information about the flagged Atlas object(s).

Atlas Checks Analysis

Although Atlas Checks’ logic is based on specific OSM standards and rules, the flags produced may contain false positives due to various factors. For example, a geometry-based check, LineCrossingWaterBodyCheck, flags OSM Ways that cross waterbodies invalidly.

Figure 3: An OSM Way (ferry route) intersecting a water body.

In Figure 3, an OSM Way with route=ferry tag crosses a river. It was flagged erroneously by the LineCrossingWaterBodyCheck as an invalid crossing. To eliminate potential false positives, we recommend analyzing a sample of each checks’ results across unique markets around the world. The benchmark for accuracy is 90%!

What’s Next?

As the demand for quality assurance increases, the Atlas Checks project will continue to evolve with hopes to be adopted as the industry standard.

Check out the repository for new checks, and create a GitHub issue if you have ideas for new use cases or improved functionality!

Share this post
Related Posts

Become an insider

Sign up for quarterly insights on topics you care about, including GIS, geospatial, enterprise systems, open data and development, and more. We’ll share industry best practices, user stories, and relevant information you can use in your own work.