HERE platform pipelines can be simple or complex, based on the data processing workflow being implemented. The basic structure of the pipeline code is well established and a new build project is typically initiated using a Maven Pipeline Template for stream or batch process workflows. Different templates are used so that the pipeline service instantiates the correct type of pipeline.
The structure of a typical HERE platform pipeline is shown in Figure 1.
However, this is a simplistic structural view. If you are just going to be executing pipelines, continue with the next section.
Using a pipeline requires that you know what data processing task it was designed to do. The important points include the following.
- The filename (and version) of the pipeline JAR file
- The type of pipeline it is: batch or stream
- The data processing task it implements
- The one or more data catalogs it is intended to process data from
- The output catalog that it is to use for storing processed data
- The cluster configuration it is recommended to use
- The Group or the Project it is assigned to
It is likely that you will be dealing with many different pipeline JAR files. So, it is important to keep them organized. A standard naming and versioning system is highly recommended.
While it is true that a JAR file is executable (in a software sense), the pipeline it contains cannot be executed until it is loaded onto the HERE platform pipeline and configured to run in the correct framework (that is, batch or stream). That task in the HERE platform is referred to as Deployment. One pipeline JAR file can support multiple running pipelines, each with its own run-time configuration. The only limit on how many pipelines can be run at the same time is the availability of computing resources. For more information about the deployment process, see the Pipeline Lifecycle article.