Implement User Defined Functions

The Data Archiving Library provides several APIs for indexing data which you should implement:

  • SimpleUDF.getKeys(metadata, message) or MultiKeysUDF.getMultipleKeys(metadata, message) or SplittedUDF.getSplittedKeys(metadata, message)
  • aggregate(keys, messages)
  • merge(keys, files) (Deprecated. Use Index Compaction Library instead.)

You can implement this API or reuse the default implementation:

  • transformForDeadLetter(message)

You can use this API to setup your function, including the registration of user defined metrics:

  • open(parameters, runtimeContext)

This topic describes these methods. For complete details, see: JavaDoc

SimpleUDF.getKeys(metadata, message)

For each message, use this method to provide the index attributes to the Data Archiving Library when you have a single value in each attribute. The index attributes can also be extracted from the message.

  • The metadata argument provides this information:
    • The time when data was ingested into the HERE platform (INGESTION_TIME)
    • The time when data was processed by the Data Archiving Library (PROCESSING_TIME)
    • The output catalog and index layer ID (SINK_CATALOG, SINK_LAYER)
    • The input catalog and stream layer ID (SOURCE_CATALOG, SOURCE_LAYER)
    • The message ID of the input data (MESSAGE_ID)
  • Note that the attribute names need to match the indexDefinitions of the index layer.
  • The timewindow type is required and only one timewindow should be returned from this method.
  • If an index definition of type heretile is present, then the zoom level of tile value should match the zoom level defined in index layer.
  • If there is an error or exception in this implemented method, then that message will not be archived.

MultiKeysUDF.getMultipleKeys(metadata, message)

For each message, use this method to provide the index attributes to the Data Archiving Library when you have multiple values in each attribute. The index attributes can also be extracted from the message.

  • With MultiKeysUDF, you must return a list of values for each attribute. If you do not return a list of values for each attribute, you will get validation exceptions.
  • All input parameters have the same content as the SimpleUDF.getKeys method.
  • All validation rules for each value in the values list of each attribute is the same as the SimpleUDF.getKey method.
  • If there is an error or exception in this implemented method, that message will not be archived.
  • All list of values of each attribute will be combined together. Here are some examples:

    Example 1:
       Input
        attribute1: {value1_1, value1_2}, attribute2: {value2_1, value2_2}
       Output
        {value1_1, value2_1}, {value1_1, value2_2}, {value1_2, value2_1}, {value1_2, value2_2}
    
    Example 2:
       Input
        attribute1: {value1}, attribute2: {value2_1, value2_2}, attribute3: {value3}
       Output
        {value1, value2_1, value3}, {value1, value2_2, value3}
    

SplittedUDF.getSplittedKeys(metadata, message)

Use this method to provide the index attributes to the Data Archiving Library when you have the same value that maps to multiple attributes and you want to split the message into multiple messages. The index attributes can also be extracted from the message.

  • With SplittedUDF, you can split a message into smaller messages with different indexes.
  • The indexing attributes for each smaller message will be the key of returned Map<Map<String, Object>, byte[]> with corresponding smaller message value as byte[]. Each attribute is represented as Map<String, Object>.
  • All input parameters have the same content as the SimpleUDF.getKeys method.
  • All validation rules for each value of Map<String, Object> are the same as the SimpleUDF.getKey method.
  • If there is an error or exception in this implemented method, that message will not be archived.

aggregate(keys, messages)

For each group of messages (based on indexing attributes in an aggregation window), this method correlates the group to the attributes (keys) specified in the function parameter keys.

merge(keys, files)

Deprecated. Use Index Compaction Library instead.

As archived data grows over time, the Data Archiving Library optimizes the archived data so it can be queried more efficiently. For this optimization to be successful, use this method to specify how to merge your previously-aggregated files together. Each group of files is correlated to the attributes (keys) provided in the function parameter keys.

transformForDeadLetter(message)

For the error strategy 'deadletter', the user may define a function to transform a message before archiving in the deadletter layer. See deadletter strategy for more information.

open(parameters, runtimeContext)

Initialization method which is called before the actual working methods (like getKeys/aggregate/merge) and thus suitable for one time setup work. By default, this method is a no-op.

results matching ""

    No results matching ""