The Data Processing Library contains the following modules:
batch-core— provides the core functionality, including the
DriverTaskabstractions and all compilation patterns.
batch-catalog— provides the
Catalogabstraction to access catalogs via Spark. This module contains the abstract interfaces only.
batch-catalog-dataservice— provides the implementation of the
batch-catalogabstractions for the HERE Data API.
pipeline-runner— provides the
PipelineRunnerclass that constitutes the entry point of a batch pipeline. This module depends on
batch-core-java— Java bindings for the
batch-catalog-java— Java bindings for the
pipeline-runner-java— Java bindings for the
batch-validation— provides a set of classes and DeltaSet transformations to implement data validation pipelines.
batch-validation-scalatest— provides scalatest bindings to implement data validation suites using scalatest Domain Specific Language.
To use the Data Processing Library in your Scala applications, it is sufficient to include
batch-catalog-dataservice as dependencies.
For Java applications, you also need to include
pipeline-runner-java as a dependency.
For more information on how to manage dependencies, see Dependency Management.