The Data Validation Library is built on top of the Data Processing Library (*). The validation library provides a series of processors that run on input catalogs to validate their data and characterize the results.

Architecture Overview
Figure 1. Architecture Overview

The validation workflow has three phases, each carried out by their own respective processors:

  1. Validator -- a processor that inspects the data release candidate's input catalog and publishes the results to the Validator's output catalog. These results describe issues such as improperly clipped geometry, consistency, or format.
  2. Analyzer -- a processor that reads the results from the Validator, groups the errors according to context, and assigns severities. These results are published to the Analyzer's output catalog.
  3. Assessor -- a processor that reads the Analyzer's output catalog and implements analytics to determine the quality of the original input that the Validator processed. These results are published to the Assessor's catalog. You can then use these results to further gate or trigger a live deployment of your original data release candidate's input catalog.

To implement each of these phases, you can use the MapReduce semantics which the validation library exposes on top of the Data Processing Library's compiler patterns.

The validation library provides an API for each of these processor phases through which you can define:

  • Feature Loaders -- to extract data from one or more input catalogs, with the ability to correlate fields across layers and catalogs via the Data Processing Library's RefTree semantics.
  • Mappers -- to run through the data in each input partition and inspect the data to determine its output partition.
  • Reducers -- to run in each output partition, with full contextual data as grouped by the Mapper. The final result is written to an output catalog.

You can run the processors you implement locally, or on the HERE platform portal.

results matching ""

    No results matching ""