Create Batch Pipelines

For more information on your options for developing batch pipelines, see SDK Workflows.

Note: Maven

Batch archetypes are supported by the Maven build system only.

The HERE Data SDK offers the following archetypes.

  • batch-direct1ton-java-archetype and batch-direct1ton-scala-archetype for Direct1toN compilation for Java and Scala
  • batch-directmton-java-archetype and batch-directmton-scala-archetype for DirectMtoN compilation for Java and Scala
  • batch-reftree-java-archetype and batch-reftree-scala-archetype for RefTree compilation for Java and Scala
  • batch-mapgroup-java-archetype and batch-mapgroup-scala-archetype for MapGroup compilation for Java and Scala

For more information on batch pipeline design patterns, see the Data Processing Library.

Generate New Batch Pipeline Projects

You can use Maven Archetypes either in the command line or in an IDE such as Eclipse or IntelliJ IDEA. The examples below are for the command line option.

Follow the steps below to generate a new batch pipeline project.

  1. Create a wrapper project.
  2. Add a schema definition module.
  3. Add one or more batch pipelines.
  4. Build your project.

After you build your project, you can start development.

Create a Wrapper Project

To create a wrapper for your project from the public Maven pom-root archetype, run mvn archetype:generate with the following parameters.

Linux
Windows
mvn archetype:generate -DarchetypeGroupId=org.codehaus.mojo.archetypes \
                       -DarchetypeArtifactId=pom-root \
                       -DarchetypeVersion=1.1 \
                       -DgroupId=com.example \
                       -DartifactId=myproject \
                       -Dversion=1.0-SNAPSHOT \
                       -Dpackage=com.example.myproject
mvn archetype:generate -DarchetypeGroupId=org.codehaus.mojo.archetypes ^
                       -DarchetypeArtifactId=pom-root ^
                       -DarchetypeVersion=1.1 ^
                       -DgroupId=com.example ^
                       -DartifactId=myproject ^
                       -Dversion=1.0-SNAPSHOT ^
                       -Dpackage=com.example.myproject

The above example uses the following values.

  • archetypeGroupId org.codehaus.mojo.archetypes - do not change
  • archetypeArtifactId pom-root - do not change
  • archetypeVersion 1.1 - do not change
  • groupId com.example
  • artifactId myproject
  • version 1.0-SNAPSHOT
  • package com.example.myproject

Set your own values for the relevant parameters for your project.

Add a Data Schema Definition Module

For input data, use a schema defined in the layer configuration for your input catalog. Batch pipelines require one or more input layers and a single output layer. They are defined by data schemas. You can either use a schema from the platform or create your own. If you use an existing schema for your output layer, you can include the schema artifact in your pipeline project.

When a new layer is created, you can specify a schema. This schema defines the structure for the data in the layer. You can select an existing schema or define your own.

The Data SDK provides a Maven Archetype for creating new schemas and extending existing data schemas, see Create and Extend Schemas.

To include an existing schema artifact into your pipeline project, follow the steps below.

  1. Go to the HERE platform portal.
  2. Open the Data tab and search for the target layer.
  3. Go to the Schema tab for the layer.
  4. Copy the Maven dependencies and paste them into your pipeline project POM file.

For more information, see the Artifact Service section in Dependency Management.

Create Your Own Schema

The Data SDK provides a Maven Archetype for creating new schemas and extending existing data schemas. For more information, see Create and Extend Schemas.

To add a data schema definition model to your project, enter the following command in your project folder.

Linux
Windows
mvn archetype:generate \
                       -DarchetypeGroupId=com.here.platform.schema \
                       -DarchetypeArtifactId=project_archetype \
                       -DarchetypeVersion=1.0.35 \
                       -DgroupId=com.example.myproject \
                       -DartifactId=model1 \
                       -Dversion=1.0.0 \
                       -Dpackage=com.example.myproject.model1 \
                       -DmajorVersion=0
mvn archetype:generate -DarchetypeGroupId=com.here.platform.schema ^
                       -DarchetypeArtifactId=project_archetype ^
                       -DarchetypeVersion=1.0.35 ^
                       -DgroupId=com.example.myproject ^
                       -DartifactId=model1 ^
                       -Dversion=1.0.0 ^
                       -Dpackage=com.example.myproject.model1 ^
                       -DmajorVersion=0

The above example uses the following values.

  • archetypeGroupId com.here.platform.schema - do not change
  • archetypeArtifactId project_archetype - do not change
  • archetypeVersion 1.0.35 - do not change
  • groupId com.example.myproject
  • artifactId model1
  • version 1.0.0
  • package com.example.myproject.model1
  • majorVersion 0

The last parameter defines the major version of your data schema and is in the package name. This allows you to use multiple major schema versions simultaneously.

Set your own values for these parameters for your project.

You may create one or more data schema definition models for your project.

To use the schema with your batch pipeline, publish the schema project and add the resulting schema artifacts as a dependency. For Java bindings, use the artifact from the project with _java in the title. For Scala bindings, use the artifact from the project with _scala in the title.

To publish schema artifacts to the local repository, enter the following command in your schema project folder.

mvn install

For more information about the schema publishing, see Create and Extend Schemas.

Add a Java Batch Pipeline

If your project uses Java, add a Java batch pipeline to your project and instantiate the Java batch pipeline archetype.

To add a Java batch pipeline to your project, enter the following command in your project folder.

Linux
Windows
mvn archetype:generate -DarchetypeGroupId=com.here.platform \
                       -DarchetypeArtifactId=batch-directmton-java-archetype \
                       -DarchetypeVersion=1.0.609 \
                       -DgroupId=com.example.myproject \
                       -DartifactId=batch1 \
                       -Dversion=1.0-SNAPSHOT \
                       -Dpackage=com.example.myproject.batch1
mvn archetype:generate -DarchetypeGroupId=com.here.platform ^
                       -DarchetypeArtifactId=batch-directmton-java-archetype ^
                       -DarchetypeVersion=1.0.609 ^
                       -DgroupId=com.example.myproject ^
                       -DartifactId=batch1 ^
                       -Dversion=1.0-SNAPSHOT ^
                       -Dpackage=com.example.myproject.batch1

The above example uses the following values.

  • archetypeGroupId com.here.platform - do not change
  • archetypeArtifactId batch-directmton-java-archetype
  • archetypeVersion 1.0.609- do not change
  • groupId com.example.myproject
  • artifactId batch1
  • version 1.0-SNAPSHOT
  • package com.example.myproject.batch1

The DarchetypeArtifactId parameter defines the compiler mode of the batch pipeline. In the example, the selected compiler is the default DirectMtoN compiler module.

To switch to a different pipeline archetype, specify one of the following options.

  • batch-direct1ton-java-archetype for a Direct1toN compiler
  • batch-directmton-java-archetype for a DirectMtoN compiler
  • batch-reftree-java-archetype for a RefTree compiler
  • batch-mapgroup-java-archetype for a MapGroup compiler

For more information on batch pipeline design patterns, see the Data Processing Library.

To use your schema in a pipeline project, you have to include the dependency in the project POM file. For Java bindings, use the artifact from the schema project with _java in the title.

You may create one or more batch pipelines in your project.

Add a Scala Batch Pipeline

If your project uses Scala, add a Scala batch pipeline to your project and instantiate the Scala batch pipeline archetype.

To add a Scala batch pipeline to your project, enter the following command in your project folder.

Linux
Windows
mvn archetype:generate -DarchetypeGroupId=com.here.platform \
                       -DarchetypeArtifactId=batch-directmton-scala-archetype \
                       -DarchetypeVersion=1.0.609 \
                       -DgroupId=com.example.myproject \
                       -DartifactId=batch2 \
                       -Dversion=1.0-SNAPSHOT \
                       -Dpackage=com.example.myproject.batch2
mvn archetype:generate -DarchetypeGroupId=com.here.platform ^
                       -DarchetypeArtifactId=batch-directmton-scala-archetype ^
                       -DarchetypeVersion=1.0.609 ^
                       -DgroupId=com.example.myproject ^
                       -DartifactId=batch2 ^
                       -Dversion=1.0-SNAPSHOT ^
                       -Dpackage=com.example.myproject.batch2

The above example uses the following values.

  • archetypeGroupId com.here.platform - do not change
  • archetypeArtifactId batch-directmton-scala-archetype
  • archetypeVersion 1.0.609 - do not change
  • groupId com.example.myproject
  • artifactId batch2
  • version 1.0-SNAPSHOT
  • package com.example.myproject.batch2

The DarchetypeArtifactId parameter defines the compiler mode of the batch pipeline. In the example, the selected compiler is the default batch-directmton-scala-archetype compiler module.

To switch to a different pipeline archetype, specify one of the following options.

  • batch-direct1ton-scala-archetype for a Direct1toN compiler
  • batch-directmton-scala-archetype for a DirectMtoN compiler
  • batch-reftree-scala-archetype for a RefTree compiler
  • batch-mapgroup-scala-archetype for a MapGroup compiler

For more information on batch pipeline design patterns, see the Data Processing Library.

To use your schema in a pipeline project, you have to include the dependency in the project POM file. For Scala bindings, use the artifact from the schema project with _scala in the title.

You may create one or more batch pipelines in your project.

Build Your Project to Run Locally

To build your project, enter the following command in your project folder.

mvn install

Build Your Project to Run on the Platform

To run your pipeline on the platform, you need to build a fat jar first. This can be done by running the following command.

mvn install -Pplatform

For more information on building a fat jar, see the Data Processing Library.

results matching ""

    No results matching ""