Configure index layer settings
An index layer has a user-defined structure and it can contain up to four attributes, one of which must be a time attribute. An index layer contains the metadata (values of the index attributes) plus some additional information such as the data handle for the data blob, the size of the data, timestamps, and checksums.
Data in index layers is encrypted and stored for the amount of time specified in the layer's retention setting, also known as Time-To-Live (TTL).
An index layer can be configured with a retention time value, also known as Time-To-Live or TTL. The TTL value defines the number of days that records are kept in the layer and available for query. After the expiration of the TTL, the record is eligible for removal (actual deletion may take 24 hours after the expiration). This TTL value is applied to all records published to the layer.
Setting a TTL for your index layer enables you to automate your data management practices so that you can more easily maintain your data retention and cost. Selecting limitless retention requires you to manually manage these variables as your data will continue to accumulate as long as it is ingested and processed by your pipeline. Alternatively, a TTL setting of seven days will start deleting data records when they are seven days old. All newer data records are retained until they also reach seven days in age.
The minimum TTL is 7 days. This is also the default TTL setting.
Once an index layer is created, you can update the retention settings programmatically via API, DCL, CLI. Updates to the TTL once the layer is already created are not yet possible via the portal. See Reconfigure a Layer.
Extending the retention period can cause issues with the synchronization of records between the index layer and blob store. If your index layer already contains records that are within 24 hours of expiring, the retention-value update is not guaranteed to be applied instantaneously by the blob store, which could result in expiration of data in blob store but persistence of the corresponding entry in the index layer. For this reason, be cautious when extending the retention period. There could be data loss and inconsistency for entries with timestamps near the original expiration date. Contact index service administrator for guidance.
Meta attributes are the implicit index attributes which are defined for all the index layers by default. Currently defined meta attributes:
id - unique id of the index record used also as the data handle for accessing payload from the Blob API. The uniqueness must be guaranteed within a layer and it must follow the UUID format.
size - size of the corresponding data payload (via id/data handle) in bytes.
checksum - checksum of the data payload calculated with a digest specified in the layer definition. Checksum can NOT be used in the Query API.
metadata - collection of the user defined key/value pairs in the JSON format. Metadata can NOT be used in the Query API.
timestamp - insertion time (UTC) of the index record which is automatically generated by index layer.
Index attributes (custom keys)
Index attributes define the keys by which you can query data in the index layer. For example, you can define an index layer to index sensor data from automobiles using the attributes time, tile ID, and event type. You can then develop a pipeline using the Data Archiving Library to aggregate data based on these attributes. After messages are indexed into the index layer, you can query the data based on the time, tile ID and event type.
The index layer requires time (either ingestion time or event time) as an attribute to facilitate the archival and querying of stream data where time is typically an important factor. While time is required, the remaining three optional attributes can be used to index and query by other aspects of the data such as location.
All index attributes, also known as keys, have two properties:
name attribute is used primarily in the query API to express the query predicate. The
type attribute defines the data type stored in the attribute. The supported types are:
- Standard types:
bool - a boolean value.
int - a signed integer, up to 64 bits. This type is deprecated. Use
long - a signed integer, up to 64 bits.
string - a string of Unicode characters with a maximum length of 40.
- Platform types:
heretile - represents the tile id in the HERE tile map tiling scheme. The
heretile type has an attribute zoomLevel which represents the size of the tile. It is not mutable.
timewindow - represents the finest time granularity at which the data will be indexed and later queried. The
timewindow is a time slice, not just a point in time. For example, if you specify a time window of one hour, the value of the
timewindow attribute for all records with an event time in a given 60-minute window will have the same index value for
timewindow. You can specify the duration for a time window, which represents the time slice length and is not mutable. Both the
timewindow value and
timewindow duration are expressed in milliseconds, and the time value is milliseconds since Epoch. Note that the
timewindow value is represented as the timestamp of the beginning of the window.
Once the layer is created, you cannot update the indexing attributes.
Attribute validation rules
Index attributes must conform to the following rules:
- Minimum number of index attributes is 1.
- Maximum number of index attributes is 4.
- The maximum attribute name length is 64 characters.
- Each index attribute must have a unique name. The index attribute name cannot be any of the reserved meta attribute names defined above.
- Attribute names must begin with a Unicode letter. Subsequent characters can be letters, underscores (_), and digits (0-9).
- There can be at most one
heretile attribute (optinoal).
zoomLevel attribute for the
heretile type must not be null and from 0 to 14.
- There must be one and only one
duration attribute of the
timewindow type must not be null and from 600000 (10 minutes) to 86400000 (24 hours).
To define an index, use the
config API, defining the index in the payload. Here is an example payload:
It is highly recommended to not specify a zoomLevel higher than 10.
This example is for illustration purposes only. For complete information about the
config API, see API Reference.
The content type specifies the media type to use to identify the kind of data in the layer.
The content encoding setting determines whether to use compression to reduce the size of data stored in the layer. To enable compression, specify gzip.
Compressing data optimizes storage size, transfer I/O, and read costs. However, compressing data results in extra CPU cycles for the actual compression. Consider both the benefits and costs of compression when deciding whether to enable compression for a layer.
Some formats, especially textual formats such as text, XML, JSON, and GeoJSON, have very good compression rates. Other data formats are already compressed, such as JPEG or PNG images, so compressing them again with gzip will not result in reduced size. Often, it will even increase the size of the payload. For general-purpose binary formats like Protobuf, compression rates depend on the actual content and message size. You are advised to test the compression rate on real data to verify whether compression is beneficial.
Compression should not be used for Parquet. Compression breaks random access to blob data, which is necessary to efficiently read data in Parquet.
If the layer contains SDII data, note that the
/layers/<layerID>/sdiimessagelist endpoint does not support compression. So if you enable compression for a layer containing SDII data, you must use the
ingest API's generic endpoint (
/layers/<layerID>) and all compression and decompression must be handled by your application.
If you are using the Data Client Library to read or write data from a compressed layer, compression and decompression are handled automatically.
If you are using the Data API to read or write data from a compressed layer, you must compress data before writing it to the layer. When reading data, the data you receive is in gzip format and you are responsible for decompressing it.
Specifying a schema enables you to share data with others by defining for others how to consume the data. For more information, see Schemas.
The digest property specifies the algorithm used by the data publisher to generate a hash for each partition in the layer. By specifying a digest algorithm for the layer, you communicate to data consumers the algorithm to use to verify the integrity of the data they retrieve from the layer.
You can specify a digest algorithm when creating or updating a layer. If you specify "undefined", you can specify another digest algorithm after the layer is created. If you specify a digest algorithm, you cannot change it later.
When choosing a digest algorithm, consider the following:
- SHA-256 is recommended for applications where strong data security is required
- MD5 and SHA-1 is acceptable when the purpose of applying a hash is to verify data integrity during transit.
Including a hash is optional, but if you intend to provide hashes for partitions in this layer you should specify the algorithm you will use.
The HERE platform does not verify that the algorithm you specify here is the one used to generate the actual hashes, so it is up to the data publisher to ensure that the algorithm specified here is the one used in the publishing process.
For more information about common algorithms, see Secure Hash Algorithms.
Digest and CRC are two different fields. Digest is used for security to prevent human tampering. CRC is used for safety to prevent bit flips by computer hardware or network transportation. You can use both fields.
crc property specifies the CRC algorithm used by the data publisher to generate a checksum for each partition in the layer. When you specify a CRC algorithm for the layer, you tell data consumers which algorithm to use so they can verify the integrity of the data they retrieve from the layer.
You can specify a CRC algorithm when creating or updating a layer. If you specify "undefined", you can specify another CRC algorithm after the layer is created. If you specify a CRC algorithm, you cannot change it later.
This CRC has the following properties
- Padded with zeros to a fixed length of 8 characters
- Stored as a string For example, if your calculated CRC is the
uint32 value of
0x1234a, then the CRC that is actually stored for the partition is the string
Currently only one CRC algorithm is supported:
For more information about common algorithms, see Cyclic redundancy check.
Including a checksum is optional but if you intend to provide checksums for partitions in this layer, you should specify the algorithm you will use.
The HERE Workspace does not verify that the algorithm you specify here is the one used to generate the actual checksums, so it is up to the data publisher to ensure that the algorithm specified here is the one used in the publishing process.
Digest and CRC are two different fields. Digest is used for security reasons to prevent human tampering. CRC is used for safety reasons to prevent bit flips caused by computer hardware or network transportation. You can use both fields.
The geographic area that this layer covers. This setting controls which areas of the world are highlighted in the layer's coverage map in the platform portal.
Specify a list of countries and regions using the two-character ISO 3166-1 alpha 2 code. Optionally, you can add a two-character country subdivision code using the ISO 3166-2 codes for country subdivisions. For example, you can specify 'DE' for Germany, 'BR' for Brazil, or 'CN-HK' for Hong Kong.