Configure stream layer settings
You can configure stream layer settings when creating a layer in a catalog.
Data in stream layers is encrypted and stored for the amount of time specified in the layer's retention setting, also known as Time-To-Live (TTL).
Throughput can be configured in increments of 100 KBps (Kilobytes per second) with the minimum configuration being 100 KBps inbound and 100 KBps outbound.
The default value for inbound throughput is 1000 KBps (Kilobytes per second) and the default outbound throughput is 4000 KBps (Kilobytes per second). You can specify up to 32800 KBps (Kilobytes per second) for inbound and 65500 KBps (Kilobytes per second) for outbound.
You can specify the maximum throughput for data going into the layer and, separately, the maximum throughput for data going out of the layer.
The HERE platform starts throttling inbound messages when the inbound rate exceeds the inbound throughput. It starts throttling outbound messages when the total outbound rate to all consumers exceeds the outbound throughput. When throttling occurs, the service response is delayed but no messages are dropped. There are no special HTTP responses that indicate throttling is occuring. The only indication is a service slowdown.
Catalogs in HERE Marketplace have a maximum outbound throughput of 2000 KBps (Kilobytes per second).
We recommend that you set the outbound throughput to be at least the expected number of consumers (users and pipelines) times the inbound throughput. The output rate can be higher if some consumers "replay" recent data. The inbound throughput must not be more than the outbound throughput. If it is, the consumer cannot read all the data that the producer provides.
Stream partition parallelization
Stream partition parallelization determines how many consumers in the same consumer group will be able to consume data from the same layer in parallel. The stream partition parallelization value you specify determines the number of internal stream partitions that are created. The minimum value is 1 stream partition and the maximum is 32 stream paritions.
The best practice is to supply Throughput in, Throughput out, and Parallelization. If only Throughput in and Throughput out are supplied, parallelization is calculated by rounding up Throughput in to nearest MBps value. For example, if Throughput in is ~500 KBps (Kilobytes per second), parallelization would be 1. If you only specify a value for Parallelization, the following formula is used to compute the throughput values:
Throughput in = min(round(parallelization*1024/100)*100,32000) KBps
Throughput out = min(round(parallelization*2048/100)*100,65500) KBps
If you do not specify values for Throughput in, Throughput out, and Parallelization, these default values are used:
- Throughput in: 1000 KBps (Kilobytes per second)
- Throughput out: 4000 KBps (Kilobytes per second)
- Parallelization: 4
A stream layer can be configured with a retention time value, also known as Time-To-Live or TTL. The TTL value defines the minimum length of time that a message remains available for consumption. Messages are removed from the layer after the TTL time has elapsed, but not exactly at the retention time specified by the TTL setting. Messages may remain in the system for a period of time after the TTL time has elapsed. You are not charged for data storage beyond the TTL setting.
The TTL value is applied to all messages published to a layer.
Specifying a larger TTL value is useful when the data producer expects consumers to replay (re-consume) recent data. For example, a layer stores vehicle sensor data and the consumer wants to compare results of different algorithms to identify a particular road or vehicle condition. The consumer needs to consume data for three recent hours several times. In this case, a TTL of four or more hours would be appropriate. Another example is consuming all the data that has been received within the last hour, every hour. Inb this case, the TTL should be set for more than one hour. A final example use of a larger TTL is to enable consumers to re-process published messages if processing resulted in an error.
The valid range of TTL values is from 10 minutes to 3 days. The default value is 1 hour. Specify the TTL in milliseconds.
The content type specifies the media type to use to identify the kind of data in the layer.
The content encoding setting determines whether to use compression to reduce the size of data stored in the layer. To enable compression, specify gzip.
Compressing data optimizes storage size, transfer I/O, and read costs. However, compressing data results in extra CPU cycles for the actual compression. Consider both the benefits and costs of compression when deciding whether to enable compression for a layer.
Some formats, especially textual formats such as text, XML, JSON, and GeoJSON, have very good compression rates. Other data formats are already compressed, such as JPEG or PNG images, so compressing them again with gzip will not result in reduced size. Often, it will even increase the size of the payload. For general-purpose binary formats like Protobuf, compression rates depend on the actual content and message size. You are advised to test the compression rate on real data to verify whether compression is beneficial.
Compression should not be used for Parquet. Compression breaks random access to blob data, which is necessary to efficiently read data in Parquet.
If the layer contains SDII data, note that the
/layers/<layerID>/sdiimessagelist endpoint does not support compression. So if you enable compression for a layer containing SDII data, you must use the
ingest API's generic endpoint (
/layers/<layerID>) and all compression and decompression must be handled by your application.
If you are using the Data Client Library to read or write data from a compressed layer, compression and decompression are handled automatically.
If you are using the Data API to read or write data from a compressed layer, you must compress data before writing it to the layer. When reading data, the data you receive is in gzip format and you are responsible for decompressing it.
Specifying a schema enables you to share data with others by defining for others how to consume the data. For more information, see Schemas.
The geographic area that this layer covers. This setting controls which areas of the world are highlighted in the layer's coverage map in the platform portal.
Specify a list of countries and regions using the two-character ISO 3166-1 alpha 2 code. Optionally, you can add a two-character country subdivision code using the ISO 3166-2 codes for country subdivisions. For example, you can specify 'DE' for Germany, 'BR' for Brazil, or 'CN-HK' for Hong Kong.