HERE platform pipelines generate certain standard metrics that can be used to track their status over time. The standard metrics are listed in the Logs, Monitoring, and Alerts User Guide. Custom metrics can also be inserted into the pipeline's code. These metrics are all displayed in a Pipeline Status dashboard and are used to generate alerts for specific events associated with a pipeline job, like a failed job.
To monitor the status of a pipeline, a Pipeline Status dashboard is available in Grafana. From the platform portal, open the
Launcher menu and select
Primary monitoring and alerts (No. 1 in Figure 1). This takes you to the Grafana home page.
Launcher menu item 2 is also Grafana, but only for high availability catalogs.
Menu item 3 is the link to Splunk for reviewing event and error logs.
The home page looks something like this:
The home page allows access to several dashboards. Several default dashboards are listed on the left side of the page. And the available User Defined Alerts are listed on the right side of the page.
From the list of default dashboards, locate the Pipeline Status dashboard. Click on the dashboard name to open it.
The Pipeline Status dashboard displays the following status of Pipeline jobs:
Each Pipeline Status is color-coded to allow quick identification. The dashboard can also be filtered by Pipeline Status and Pipeline Type (Flink or Spark). For more details, see Pipeline Status definitions.
Note: Default Dashboard Settings
Default Time Period: Last 24 hours
Default Refresh Interval: 30 minutes
Click on the Grafana logo in the top left corner of the screen. This opens the side menu bar.
Alertingitem on the menu bar. Then, locate the
Notification Channelon the submenu as shown here.
Click on “Notification Channels” and the screen will change to show something like the image below.
Locate the Notification Channel named "Pipeline Failure Notification." Click on the channel's
Editbutton for to change the configuration of the notification channel.
Specify the list of email addresses that will receive failure alerts as shown here.
To test your alert changes, click the
Send Testbutton. This will send a test message to each email on the alert list.
Savebutton to save your notification changes.
With these changes, the listed email addresses will start receiving alerts when Pipeline Jobs fail. You can also see the alerts on the Pipeline Status dashboard.
Note: Default Alert Settings
Default Alert Interval: Last 1 minute
Default Alert frequency for Failed jobs: Every 60 seconds
Caution: Dashboard Sampling
When choosing a larger sampling time-period in Grafana, it uses a sampling mechanism that shows fewer data points than it should. This allows for quicker responses, but to see more accurate data, you should shorten the time-period to be investigated.
Failure Emails are only sent when the alert's state changes. For example, if a pipeline job fails, the alert goes to Alerting state and a failure email is sent to the specified recipients. If another pipeline job fails within the default alert interval of 1 minute, a second email cannot be sent. The first alert state must transition to the “No Data” state, at the end of the 1 minute interval, before any subsequent failures can trigger alert emails. This behavior results in the following two emails being sent:
- [Alerting] - For Pipeline Jobs that failed within the last 1 minute period, including details about the failed pipeline jobs. Sent when the alert is first reported.
- [No Data] - For Pipeline Jobs that failed in within the last 1 minute period, including an empty email body. Sent at the end of the 1 minute interval.
This is an inherent behavior of Grafana and not a limitation of the HERE platform. Figure 9 illustrates what is happening and how Fault 2 is not processed.
Note: Splunk Dashboard
Click on the
Logs menu item to get to the Splunk Dashboard. This will not focus on any one specific job; see the Error Logs section below for how to access the logs for a specific job.
There are 4 levels of logging available for platform pipelines:
Info. The logging level can be set using the platform portal, the CLI, or the API. Or, you can just use the default logging level of
To examine the logs for running pipeline jobs, click on View Jobs for a Pipeline Version to display the jobs history. cd...
Then, click on the Logging URL button for the job you wish to troubleshoot. This will open the Splunk dashboard where the logs for the selected Pipeline Version can be viewed.
For more information, see Pipeline Logging.
Different levels of logging are available for different purposes. HERE platform pipelines support the following levels of logging:
- Debug — Includes fine-grained informational events that are most useful to troubleshoot a pipeline.
- Info — Includes informational messages that highlight the progress of the pipeline at a coarse-grained level.
- Warn — Includes information on potentially harmful situations; including other run-time situations that are undesirable or unexpected, but not necessarily "wrong". This is the default logging level.
- Error — Includes other run-time errors or unexpected conditions such as error events that might still allow the pipeline to continue running.
Setting the logging level from the platform portal can be done from the Pipeline Version Details page. An example is shown in Figure 5.
The Logging Configuration panel is outlined in red here. The current logging level for this pipeline is display. To change the level, click the
Edit button. This displays the dialog box shown in Figure 6.
Info: Loggers and Levels
A Logging Level is set for a specific Pipeline Version and all of the Jobs it executes by a Logger. The default logger is set at the root level for the entire pipeline. But a logger can also be set for a specific pipeline class. And, because you can have multiple loggers, it is possible to set different loggers to different logging levels. This allows monitoring different parts of the executing pipeline code at different logging levels, if set up correctly.
To change the root logging level, use the drop-down list at the top of the dialog box to select the new logging level. Additional loggers can be added or deleted using the controls shown in Figure 7. To change the logging level of one of these loggers, click the indicated control and select the new level from the drop-down list.
If adding a new logger, the dialog box will change to provide a place to enter the information for the new logger as shown in Figure 8. The logger name is normally the class name in the pipeline code to which it should be linked. The logging level can be set as needed and does not have to match the root logging level.
Add to close the add function. Then, click
Saveto save the addition.
When adding a new logger, if you chose a logger that already exists, you will get an error message like that shown in Figure 9.
Figure 10 shows the results of adding a new logger and how it is displayed on the Pipeline Version Detail page.
If you create a logger that cannot be linked to a class in the pipeline code, there will be no logging entries from that logger.