Compare Two Simple Catalogs

This example provides a walkthrough on how to run two versions of the same input catalog through a Data Validation Library comparison.

The input catalog for this example uses the HERE tiling scheme with partitions that cover Berlin and the surrounding areas. Each of the catalog's tiles contains a single line segment: either an octagon, a horizontal line, or a vertical line.

In this example, the comparison component reads the input catalog versions and compares the contents of their linesegments layers, as well as the metadata in their state layers.

Requirements

To begin, you require an existing output catalog for the example's comparison results and access to the quick start input catalog.

The section below describes how you can create and configure this catalog in the HERE platform portal.

The HRN for the input catalog is: hrn:here-cn:data:::dvl-example-berlin4-compare-validation-quickstart-input

The source code and configuration templates for the example is in the SDK under examples/data-validation/java/quick-start for Java and examples/data-validation/scala/quick-start for Scala.

For running pipelines, this document assumes that you have already set up your app and credentials using the instructions from Get Your Credentials.

Create and Configure the Comparison Output Catalog

  1. You will be creating a catalog with three layers named heretile-comparison-results, generic-comparison-results, and state. You can do that using a configuration template file named output-comparison-catalog-platform.json.template (in the

    examples/data-validation/{java,scala}/quick-start/config/pipeline/olp directory relative

    examples/data-validation/{java,scala}/quick-start/config/pipeline/cn directory relative

    to the root of your unpacked SDK). Remove the .template extension of that file's name and replace the placeholder {output_catalog_id} with {CATALOG_ID}, where CATALOG ID is a unique identifier, such as "YOUR_LOGIN-validation-quickstart-comparison".

  2. Use the OLP CLI (tools/OLP_CLI relative to the root of your unpacked SDK) to create a catalog with a command like this:

    ./olp catalog create "{CATALOG_ID}" "{CATALOG_NAME}" \
        --summary "{CATALOG_SUMMARY}" \
        --config "{JSON_FILE_PATH}"
    

    where CATALOG ID is the unique identifier you used above, such as "YOUR_LOGIN-validation-quickstart-comparison". This identifier is the resource portion of your catalog's HERE Resource Name (HRN),

    CATALOG_NAME is a unique identifier (whitespaces are allowed), such as "YOUR_LOGIN Data Validation Library Quick Start Example Comparison Results" (this is the value that appears for your catalog in the portal's Data tab, when you list all available catalogs or search for a catalog), and

    CATALOG SUMMARY is an informal description like "Output catalog of the Comparison component in the Data Validation Library Quick Start Example" (the --summary option is actually required).

    JSON_FILE_PATH is the path to your configuration file from the previous step above.

    It will take approximately a minute for the catalog to be created on the platform, before you get a result like this, containing the HRN that you can use for all further CLI and SDK operations to identify this catalog:

    Catalog hrn:here-cn:data:::{YOUR_CATALOG_ID} has been created.
    

    The HERE Resource Name (HRN) for this catalog can now be used as the output for your comparison pipeline.

  3. Grant "read", "write", and "manage" permissions for this catalog to your group with the respective group ID by running the following command:

    ./olp catalog permission grant "{CATALOG_HRN}" --group "{GROUP_ID}" --read --write --manage
    

Configure the Comparison Pipeline

For Java, the configuration template files are in the examples/data-validation/java/quick-start/config/pipeline/cn folder. For Scala, they are in examples/data-validation/scala/quick-start/config/pipeline/cn.

  1. Fill out the template files as described below and save them without the ".template" suffix in the folder from where you are running the OLP CLI (tools/OLP_CLI relative to the root of your unpacked SDK).
  2. Replace the output catalog HRN in pipeline-comparison-config.conf to that of the catalog you created above.
  3. Replace the candidate catalog HRN in pipeline-comparison-config.conf to: hrn:here-cn:data:::dvl-example-berlin4-compare-validation-quickstart-input

Run the Quickstart on the HERE Platform with the SDK

  1. For the Java implementation, go to the examples/data-validation/java/quick-start/comparison folder. For Scala, the folder is examples/data-validation/scala/quick-start/comparison. Build the fat JAR for pipeline deployment:
    mvn clean package -Pplatform
    
    This results in the target/comparison-$VERSION-platform.jar file for Java and target/comparison_2.11-$VERSION-platform.jar for Scala.
  2. Place your configuration files in the tools/OLP_CLI folder and make tools/OLP_CLI your current directory. Then run this command to get a pipeline ID:
    ./olp pipeline create {YOUR_PIPELINE_NAME} {YOUR_GROUP_ID}
    
  3. Create the pipeline template. For Java, {MAIN_CLASS} is com.here.platform.data.validation.example.quickstart.comparison.java.Main and for Scala it is com.here.platform.data.validation.example.quickstart.comparison.scala.Main.
    ./olp pipeline template create \
        {YOUR_PIPELINE_NAME} batch-2.0.0 \
        {PATH_TO_FAT_JAR_FROM_STEP_1} \
        {MAIN_CLASS} \
        {YOUR_GROUP_ID} \
        --input-catalog-ids pipeline-comparison-config.conf
    
  4. Create the pipeline version, configuring the reference catalog hrn and version as runtime parameters, to get a pipeline version ID.
    ./olp pipeline version create \
        {YOUR_PIPELINE_VERSION_NAME} \
        {PIPELINE_ID} \
        {TEMPLATE_ID} \
        pipeline-comparison-config.conf \
        --runtime-config com.here.platform.data.validation.compare.reference.hrn=hrn:here-cn:data:::dvl-example-berlin4-compare-validation-quickstart-input com.here.platform.data.validation.compare.reference.version=0
    
  5. Activate the pipeline:
    ./olp pipeline version activate \
        {PIPELINE_ID} {PIPELINE_VERSION_ID}
    
  6. It may take a few minutes before the pipeline starts running, as your fat jar for the pipeline template may still be uploading to the platform in the background. To find out when the pipeline starts running you can either check its state via the Pipelines tab of the portal or use this OLP CLI command (when the running state is reached, the portal lets you navigate to the Splunk logs, and this olp command will output the respective URL):
    ./olp pipeline version wait \
        {PIPELINE_ID} {PIPELINE_VERSION_ID} \
        --job-state=running \
        --timeout=300
    
  7. The pipeline takes up to 10 minutes to complete. Manually refresh the Pipelines tab in the portal. If the pipeline is complete, its status will refresh from "RUNNING" to "READY".
  8. If you want to remove this pipeline, template, and version from the server, you can delete them with the commands below. However, the results of the pipeline remain in the output catalog.
    ./olp pipeline version delete {PIPELINE_ID} {PIPELINE_VERSION_ID}
    ./olp pipeline template delete {TEMPLATE_ID}
    ./olp pipeline delete {PIPELINE_ID}
    

Inspect the Comparison Output Catalog

There are at least two ways to decode the contents of output comparison catalog, either using the portal or using protoc on your local machine.

Inspect the Comparison Output Catalog in the Portal

In the portal's Data tab, click on the "heretile-comparison-results" layer for the output comparison catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-scala-comparison

On the Layer page, click the Inspect tab to open the catalog.

Between the two input catalog versions, partitions with line segments that differ are highlighted in blue.

Click on a specific partition to see its decoded data.

The portal should render the differing geometry and display the decoded data values for the selected partition.

Inspect the Comparison Output Catalog locally

In the portal's Data tab, click on the "heretile-comparison-results" layer for the output comparison catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-scala-comparison

On the Layer page, click on the Partitions tab so that we can see specific partitions.

Click on "23618304" under Partition ID.

Click on Download raw data to save the raw partition data to disk.

You can then run protoc on the raw data to decode it, using:

protoc --decode_raw < {PATH_TO_RAW_PARTITION_DATA}

The output is structured as follows:

1 {
  1 {
    1: 0x404a0d0000000000
    2: 0x4029668000000000
  }
  1 {
    1: 0x404a064000000000
    2: 0x4029668000000000
  }
}
2 {
  1 {
    1: 0x404a09a000000000
    2: 0x4029590000000000
  }
  1 {
    1: 0x404a09a000000000
    2: 0x4029740000000000
  }
}

The first item is the pair of points representing the horizontal line segment in the reference catalog version. The second item is the pair of points representing the vertical line segment in the candidate catalog version.

In the portal's Data tab, click on the "generic-comparison-results" layer for the output comparison catalog that you have created and populated.

On the Layer page, select the Partitions tab. Click on "state-fingerprints" under Partition ID.

Its content does not need to be decoded by protoc, and simply contains the string:

"checksum differs"

This indicates that the generic comparison for the "state" layer in each input catalog version yielded a checksum difference in the "fingerprints" partition. Generic comparisons do not retrieve payload content, and only compare metadata fields which are common to any catalog partition.

results matching ""

    No results matching ""