Validate a Simple Catalog

This example provides a walkthrough on how to run an input catalog through all three Data Validation Library phases: testing, metrics, and assessment, for a simple catalog. These phases are represented as Maven projects.

We provide Java and Scala implementations for this example, and they produce identical output catalogs for each phase.

The input catalog for this example uses the HERE tiling scheme with partitions that cover Berlin and the surrounding areas. Each of the catalog's tiles contains a single line segment: either an octagon or a horizontal line.

In this example, the testing component reads the input catalog and validates octagons as "PASS" tiles and straight lines as "FAIL".

Next, the metrics component reads the output of the testing component as input, assigns an error severity of CRITICAL for the "FAIL" tiles in the test results, and aggregates the failed tile IDs.

Finally, the assessment component reads the output of the metrics component as input, giving a final value of FAIL if more than 10% of the candidate tiles have CRITICAL errors.

Requirements

To begin, you require an existing output catalog for the example's test results and access to the quick start input catalog.

The section below describes how you can create and configure this catalog in the HERE platform portal. The steps to set up the catalogs for the example's metrics result and assessment, as well as deploying their pipelines are almost identical. These steps are described in Configure the Testing Pipeline.

The HRN for the input catalog is: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-input

The source code and configuration templates for the example is in the SDK under examples/data-validation/java/quick-start for Java and examples/data-validation/scala/quick-start for Scala.

For running pipelines, this document assumes that you have already set up your app and credentials using the instructions from Get Your Credentials.

Create and Configure the Testing Output Catalog

  1. You will be creating a catalog with two layers, a "Compilation State" layer and a "Test Result" layer. You can do that using a configuration template file named output-testing-catalog-platform.json.template (in the

    examples/data-validation/{java,scala}/quick-start/config/pipeline/olp directory

    examples/data-validation/{java,scala}/quick-start/config/pipeline/cn directory

    relative to the root of your unpacked SDK). Remove the .template extension of that file's name and replace the placeholder {output_catalog_id} with {CATALOG_ID}, where CATALOG ID is a unique identifier, such as "YOUR_LOGIN-validation-quickstart-testing".

  2. Use the OLP CLI (tools/OLP_CLI relative to the root of your unpacked SDK) to create a catalog with a command like this:

    ./olp catalog create "{CATALOG_ID}" "{CATALOG_NAME}" \
        --summary "{CATALOG_SUMMARY}" \
        --config "{JSON_FILE_PATH}"
    

    where CATALOG ID is the unique identifier you used above, such as "YOUR_LOGIN-validation-quickstart-testing". This identifier is the resource portion of your catalog's HERE Resource Name (HRN),

    CATALOG_NAME is a unique identifier (whitespaces are allowed), such as "YOUR_LOGIN Data Validation Library Quick Start Example Testing Results" (this is the value that appears for your catalog in the portal's Data tab, when you list all available catalogs or search for a catalog), and

    CATALOG SUMMARY is an informal description like "Output catalog of the Testing component in the Data Validation Library Quick Start Example" (the --summary option is actually required).

    JSON_FILE_PATH is the path to your configuration file from the previous step above.

    It will take approximately a minute for the catalog to be created on the platform, before you get a result like this, containing the HRN that you can use for all further CLI and SDK operations to identify this catalog:

    Catalog hrn:here-cn:data:::{YOUR_CATALOG_ID} has been created.
    

    The HERE Resource Name (HRN) for this catalog can now be used as the output for your comparison pipeline.

  3. Grant "read", "write", and "manage" permissions for this catalog to your group with the respective group ID by running the following command:

    ./olp catalog permission grant "{CATALOG_HRN}" --group "{GROUP_ID}" --read --write --manage
    

Configure the Testing Pipeline

For Java, the configuration template files are in the examples/data-validation/java/quick-start/config/pipeline/cn' folder, For Scala, they are inexamples/data-validation/scala/quick-start/config/pipeline/cn`.

  1. Fill out the template files as described below and save them without the ".template" suffix in the folder from where you are running the OLP CLI.
  2. Replace the output catalog HRN in pipeline-testing-config.conf to that of the catalog you created above.
  3. Replace the input catalog HRN in pipeline-testing-config.conf to: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-input

Run the Testing Pipeline on the HERE Platform

  1. For the Java implementation, go to the examples/data-validation/java/quick-start/testing folder. For Scala, the folder is examples/data-validation/scala/quick-start/testing. Build the fat JAR for pipeline deployment:
    mvn clean package -Pplatform
    
    This results in the target/testing-$VERSION-platform.jar file for Java and target/testing_2.11-$VERSION-platform.jar for Scala.
  2. Place your configuration files in the tools/OLP_CLI folder and make tools/OLP_CLI your current directory. Then run this command to get a pipeline ID:
    ./olp pipeline create {YOUR_PIPELINE_NAME} {YOUR_GROUP_ID}
    
  3. Create the pipeline template. For Java, {MAIN_CLASS} is com.here.platform.data.validation.example.quickstart.testing.java.Main and for Scala it is com.here.platform.data.validation.example.quickstart.testing.scala.Main.
    ./olp pipeline template create \
        {YOUR_PIPELINE_NAME} batch-2.0.0 \
        {PATH_TO_FAT_JAR_FROM_STEP_1} \
        {MAIN_CLASS} \
        {YOUR_GROUP_ID} \
        --input-catalog-ids pipeline-testing-config.conf
    
  4. Create the pipeline version to get a pipeline version ID.
    ./olp pipeline version create \
        {YOUR_PIPELINE_VERSION_NAME} \
        {PIPELINE_ID} \
        {TEMPLATE_ID} \
        pipeline-testing-config.conf
    
  5. Activate the pipeline:
    ./olp pipeline version activate \
        {PIPELINE_ID} {PIPELINE_VERSION_ID}
    
  6. It may take a few minutes before the pipeline starts running, as your fat jar for the pipeline template may still be uploading to the platform in the background. To find out when the pipeline starts running you can either check its state via the Pipelines tab of the portal or use this OLP CLI command (when the running state is reached, the portal lets you navigate to the Splunk logs, and this olp command will output the respective URL):
    ./olp pipeline version wait \
        {PIPELINE_ID} {PIPELINE_VERSION_ID} \
        --job-state=running \
        --timeout=300
    
  7. The pipeline takes up to 10 minutes to complete. Manually refresh the Pipelines tab in the portal. If the pipeline is complete, its status will refresh from "RUNNING" to "READY".
  8. If you want to remove this pipeline, template, and version from the server, you can delete them with the commands below. However, the results of the pipeline remain in the output catalog.
    ./olp pipeline version delete {PIPELINE_ID} {PIPELINE_VERSION_ID}
    ./olp pipeline template delete {TEMPLATE_ID}
    ./olp pipeline delete {PIPELINE_ID}
    

Inspect the Testing Output Catalog

There are at least two ways to decode the contents of the "test-result" layer of testing output catalog, either using the portal or using protoc on your local machine.

Inspect the Testing Output Catalog in the Portal

In the portal's Data tab, click on the "test-result" layer for the output testing catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-testing

On the Layer page, click the Inspect tab to open the catalog.

Click on a specific partition to see its decoded data.

The portal should render the validated geometry and display the decoded data values for the selected partition.

Inspect the Testing Output Catalog locally

In the portal's Data tab, click on the "test-result" layer for the output testing catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-testing

On the Layer page, click on the Partitions tab so that we can see specific partitions.

Click on "23618304" under Partition ID.

Click on Download raw data to save the raw partition data to disk.

You can then run protoc on the raw data to decode it, using:

protoc --decode_raw < {PATH_TO_RAW_PARTITION_DATA}

The output is structured as follows:

1 {
  1: "quickstarttestcase"
  2: 1
}
2 {
  1 {
    1: 0x404a41e000000000
    2: 0x402aee0000000000
  }
  1 {
    1: 0x404a41e000000000
    2: 0x402b090000000000
  }
}

The first item is our test ID and status enumeration, with 1 indicating a FAIL. The second item is the list of points in the line segment. Notice that there are only 2 points, which is what we expect from our test case criteria.

Click on "23618305" under Partition ID.

Click on Download raw data to save the raw partition data to disk.

This partition contained an octagon in the input catalog, so its decoded output has a status enumeration of 0, indicating PASS. It has 9 points, forming a closed loop for the octagon.

1 {
  1: "quickstarttestcase"
  2: 0
}
2 {
  1 {
    1: 0x404a0b8ff52daf51
    2: 0x4029a63564391dfd
  }
  1 {
    1: 0x404a0e4d590e477f
    2: 0x40299b3fd4b6bd44
  }
  1 {
    1: 0x404a0e4d590e477f
    2: 0x40298bc02b4942bc
  }
  1 {
    1: 0x404a0b8ff52daf51
    2: 0x402980ca9bc6e203
  }
  1 {
    1: 0x404a07b00ad250af
    2: 0x402980ca9bc6e203
  }
  1 {
    1: 0x404a04f2a6f1b881
    2: 0x40298bc02b4942bc
  }
  1 {
    1: 0x404a04f2a6f1b881
    2: 0x40299b3fd4b6bd44
  }
  1 {
    1: 0x404a07b00ad250af
    2: 0x4029a63564391dfd
  }
  1 {
    1: 0x404a0b8ff52daf51
    2: 0x4029a63564391dfd
  }
}

Create and Configure the Metrics Output Catalog

These steps are similar to that of the testing catalog, except that the metrics result does not render anything in the Inspect tab for the layer. Now instead of a test-result layer you create a metrics-result layer:

  1. You will configure the catalog with a "Metrics Result" layer. Like in the testing step above you can use a template file named output-metrics-catalog-platform.json.template (in the same

    examples/data-validation/{java,scala}/quick-start/config/pipeline/olp directory).

    examples/data-validation/{java,scala}/quick-start/config/pipeline/cn directory).

    Again, remove the .template extension of that file's name and replace the placeholder {output_catalog_id} with {YOURCATALOG_ID} where _YOUR CATALOG ID is a unique identifier, such as "YOUR_LOGIN-validation-quickstart-metrics".

  2. Then you run the following command to create the catalog:

    ./olp catalog create "{CATALOG_ID}" "{CATALOG_NAME}" \
        --summary "{CATALOG_SUMMARY}" \
        --config "{JSON_FILE_PATH}"
    
  3. Grant "read", "write", and "manage" permissions for this catalog to your group with the respective group ID by running the following command:

    ./olp catalog permission grant "{CATALOG_HRN}" --group "{GROUP_ID}" --read --write --manage
    

Configure the Metrics Pipeline

For Java, the configuration template files are in the examples/data-validation/java/quick-start/config/pipeline/cn folder, For Scala, they are in examples/data-validation/scala/quick-start/config/pipeline/cn.

  1. Fill out the template files as described below and save them without the ".template" suffix in the folder from where you are running the OLP CLI (tools/OLP_CLI relative to the root of your unpacked SDK).
  2. Replace the output catalog HRN in pipeline-metrics-config.conf to that of the catalog you created above.
  3. Replace the input catalog HRN in pipeline-metrics-config.conf to: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-testing

Run the Metrics Pipeline on the HERE Platform

  1. For the Java implementation, go to the examples/data-validation/java/quick-start/metrics folder. For Scala, the folder is examples/data-validation/scala/quick-start/metrics. Build the fat JAR for pipeline deployment:
    mvn clean package -Pplatform
    
    This results in the target/metrics-$VERSION-platform.jar file for Java and target/metrics_2.11-$VERSION-platform.jar for Scala.
  2. Place your configuration files in the tools/OLP_CLI folder and make tools/OLP_CLI your current directory. Then run this command to get a pipeline ID:
    ./olp pipeline create {YOUR_PIPELINE_NAME} {YOUR_GROUP_ID}
    
  3. Create the pipeline template. For Java, {MAIN_CLASS} is com.here.platform.data.validation.example.quickstart.metrics.java.Main and for Scala it is com.here.platform.data.validation.example.quickstart.metrics.scala.Main.
    ./olp pipeline template create \
        {YOUR_PIPELINE_NAME} batch-2.0.0 \
        {PATH_TO_FAT_JAR_FROM_STEP_1} \
        {MAIN_CLASS} \
        {YOUR_GROUP_ID} \
        --input-catalog-ids pipeline-metrics-config.conf
    
  4. Create the pipeline version to get a pipeline version ID.
    ./olp pipeline version create \
        {YOUR_PIPELINE_VERSION_NAME} \
        {PIPELINE_ID} \
        {TEMPLATE_ID} \
        pipeline-metrics-config.conf
    
  5. Activate the pipeline:
    ./olp pipeline version activate \
        {PIPELINE_ID} {PIPELINE_VERSION_ID}
    
  6. It may take a few minutes before the pipeline starts running, as your fat jar for the pipeline template may still be uploading to the platform in the background. To find out when the pipeline starts running you can either check its state via the Pipelines tab of the portal or use this OLP CLI command (when the running state is reached, the portal lets you navigate to the Splunk logs, and this olp command will output the respective URL):
    ./olp pipeline version wait \
        {PIPELINE_ID} {PIPELINE_VERSION_ID} \
        --job-state=running \
        --timeout=300
    
  7. The pipeline takes up to 10 minutes to complete. Manually refresh the Pipelines tab in the portal. If the pipeline is complete, its status will refresh from "RUNNING" to "READY".
  8. If you want to remove this pipeline, template, and version from the server, you can delete them with the commands below. However, the results of the pipeline remain in the output catalog.
    ./olp pipeline version delete {PIPELINE_ID} {PIPELINE_VERSION_ID}
    ./olp pipeline template delete {TEMPLATE_ID}
    ./olp pipeline delete {PIPELINE_ID}
    

Inspect the Metrics Output Catalog

There are at least two ways to decode the contents of output metrics catalog, either using the portal or using protoc on your local machine.

Inspect the Metrics Output Catalog in the Portal

In the portal's Data tab, click on the "metrics-result" layer for the output metrics catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-metrics

On the Layer page, click the Partitions tab to open the catalog.

Click on a specific partition to see its decoded data.

Inspect the Metrics Output Catalog locally

In the portal's Data tab, click on the "metrics-result" layer for the output metrics catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-metrics

On the Layer page, select the Partitions tab. Click on "FAIL" under Partition ID.

Click on Download raw data to save the raw partition data to disk.

You can then run protoc on the raw data to decode it, using:

 protoc --decode_raw < {PATH_TO_RAW_PARTITION_DATA}

The output is structured as follows:

1 {
  1 {
    1: "quickstartmetriccalc"
    2: 3
  }
  2: 23618394
  2: 23618304
  2: 23618412
  2: 23618385
  2: 23618349
  2: 23618313
  2: 23618322
  2: 23618358
  2: 23618367
  2: 23618430
  2: 23618376
  2: 23618403
  2: 23618331
  2: 23618340
  2: 23618421
}

The first item is our metric ID and severity enumeration, with 3 indicating CRITICAL. The second item is the list of tile IDs that contain single horizontal lines.

Now type "PASS" in the Search box and click Submit.

Click on Download raw data to save the raw partition data to disk.

The decoded output for this raw partition will start with something like:

1 {
  1 {
    1: "quickstartmetriccalc"
    2: 0
  }
  2: 23618375
  2: 23618343
  2: 23618324
  2: 23618351
  2: 23618307
  2: 23618389
  2: 23618414
  ...

The enumeration of 0 indicates a metric severity of NONE, and the tile IDs listed are those which contained octagons in the original input catalog and received a test result of PASS in the testing pipeline.

Create and Configure the Assessment Output Catalog

These steps are similar to that of the testing catalog. except that the assessment result does not render anything in the Inspect tab for the layer. So instead of a test-result layer, you create an assessment layer:

  1. You will configure the catalog with an "Assessment" layer. Like in the testing step above you can use a template file named output-assessment-catalog-platform.json.template (in the same

    examples/data-validation/{java,scala}/quick-start/config/pipeline/olp directory).

    examples/data-validation/{java,scala}/quick-start/config/pipeline/cn directory).

    Again, remove the .template extension of that file's name and replace the placeholder {output_catalog_id} with {YOURCATALOG_ID} where _YOUR CATALOG ID is a unique identifier, such as "YOUR_LOGIN-validation-quickstart-assessment".

  2. Then you run the following command to create the catalog:

    ./olp catalog create "{CATALOG_ID}" "{CATALOG_NAME}" \
        --summary "{CATALOG_SUMMARY}" \
        --config "{JSON_FILE_PATH}"
    
  3. Grant "read", "write", and "manage" permissions for this catalog to your group with the respective group ID by running the following command:

    ./olp catalog permission grant "{CATALOG_HRN}" --group "{GROUP_ID}" --read --write --manage
    

Configure the Assessment Pipeline

For Java, the configuration template files are in the examples/data-validation/java/quick-start/config/pipeline/cn folder, For Scala, they are in examples/data-validation/scala/quick-start/config/pipeline/cn.

  1. Fill out the template files as described below and save them without the ".template" suffix in the folder from where you are running the OLP CLI (tools/OLP_CLI relative to the root of your unpacked SDK).
  2. Replace the output catalog HRN in pipeline-assessment-config.conf to that of the catalog you created above.
  3. Replace the metrics catalog HRN in pipeline-assessment-config.conf to the HRN from your metrics catalog as the input. Alternatively, you can use this HRN for the input: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-metrics
  4. Replace the candidate catalog HRN in pipeline-assessment-config.conf with hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-input

Run the Assessment Pipeline on the HERE Platform

  1. For the Java implementation, go to the examples/data-validation/java/quick-start/assessment folder. For Scala, the folder is examples/data-validation/scala/quick-start/assessment. Build the fat JAR for pipeline deployment:
    mvn clean package -Pplatform
    
    This results in the target/assessment-$VERSION-platform.jar file for Java and target/assessment_2.11-$VERSION-platform.jar for Scala.
  2. Place your configuration files in the tools/OLP_CLI folder and make tools/OLP_CLI your current directory. Then run this command to get a pipeline ID:
    ./olp pipeline create {YOUR_PIPELINE_NAME} {YOUR_GROUP_ID}
    
  3. Create the pipeline template. For Java, {MAIN_CLASS} is com.here.platform.data.validation.example.quickstart.assessment.java.Main and for Scala it is com.here.platform.data.validation.example.quickstart.assessment.scala.Main.
    ./olp pipeline template create \
        {YOUR_PIPELINE_NAME} batch-2.0.0 \
        {PATH_TO_FAT_JAR_FROM_STEP_1} \
        {MAIN_CLASS} \
        {YOUR_GROUP_ID} \
        --input-catalog-ids pipeline-assessment-config.conf
    
  4. Create the pipeline version to get a pipeline version ID.
    ./olp pipeline version create \
        {YOUR_PIPELINE_VERSION_NAME} \
        {PIPELINE_ID} \
        {TEMPLATE_ID} \
        pipeline-assessment-config.conf
    
  5. Activate the pipeline:
    ./olp pipeline version activate \
        {PIPELINE_ID} {PIPELINE_VERSION_ID}
    
  6. It may take a few minutes before the pipeline starts running, as your fat jar for the pipeline template may still be uploading to the platform in the background. To find out when the pipeline starts running you can either check its state via the Pipelines tab of the portal or use this OLP CLI command (when the running state is reached, the portal lets you navigate to the Splunk logs, and this olp command will output the respective URL):
    ./olp pipeline version wait \
        {PIPELINE_ID} {PIPELINE_VERSION_ID} \
        --job-state=running \
        --timeout=300
    
  7. The pipeline takes up to 10 minutes to complete. Manually refresh the Pipelines tab in portal. If the pipeline is complete, its status will refresh from "RUNNING" to "READY".
  8. If you want to remove this pipeline, template, and version from the server, you can delete them with the commands below. However, the results of the pipeline remain in the output catalog.
    ./olp pipeline version delete {PIPELINE_ID} {PIPELINE_VERSION_ID}
    ./olp pipeline template delete {TEMPLATE_ID}
    ./olp pipeline delete {PIPELINE_ID}
    

Inspect the Assessment Catalog

There are at least two ways to decode the contents of output assessment catalog, either using the portal or using protoc on your local machine.

Inspect the Assessment Catalog in the Portal

In the portal's Data tab, click on the "assessment" layer for the output assessment catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-assessment

On the Layer page, click the Partitions tab to open the catalog.

Click on a specific partition to see its decoded data.

Inspect the Assessment Catalog locally

In the portal's Data tab, click on the "assessment" layer for the output assessment catalog that you have created and populated.

Alternatively you can inspect the following catalog in the portal's Data tab: hrn:here-cn:data:::dvl-example-berlin4-validation-quickstart-java-assessment

On the Layer page, select the Partitions tab. Click on "ASSESSMENT" under Partition ID.

Click on Download raw data to save the raw partition data to disk.

You can then run protoc on the raw data to decode it, using:

protoc --decode_raw < {PATH_TO_RAW_PARTITION_DATA}

The output is structured as follows:

1 {
  1: "quickstartcriteria"
  2: 1
}
2: 0x3fb999999999999a
3: 0x3fbe000000000000
4: 15
5: 128

The first item is our assessment ID and result enum, with 1 indicating FAIL. The second item is the critical threshold for the assessment criteria, that is, the percentage of total tiles permitted to contain critical errors. It is a double-precision floating point number, output here in hexadecimal format by the raw decoder. Our assessment criteria was configured to have a critical threshold of 0.1, or 10% of the total tiles. The third item is the critical percentage, that is, the percent of total candidate tiles which contained critical errors. Like the critical threshold, it is a double represented here in hexadecimal format. The value above is 0.1171875. The fourth item is the total number of tiles with critical errors. The fifth item is the total number of input tiles from the original candidate catalog that we fed into the testing component.

results matching ""

    No results matching ""