The HERE platform supports five types of data layers: versioned, volatile, index, stream and object store.
A layer of type versioned, volatile, index and stream is a set of partitions of a specific data type, functional property, and structure. You can use layers to segment data based on semantics. For example, in a catalog you can have one layer for road signs and another layer for road topology. You can also use layers to segment data based on schema. Layers can be overlaid spatially to construct a complete digital map.
A layer of type object store is a generic key/value store which does not enforce data structure.
A versioned layer stores slowly-changing data that must remain logically consistent with other layers in the catalog. When you want to update a catalog of versioned layers, all the layers related to the update (and partitions within a layer) must be updated in one publication so that they can be versioned together. For example, the HERE Map Content catalog contains several versioned layers, including Topology (road topology), Road (road attributes), and Place (points of interest). In each version of the catalog, these layers represent a consistent view of the world at that point in time. If a new road is built and there are new buildings containing new businesses along the new road, all three layers would need to be updated together in one publication so that in the new version of the catalog the Topology layer contains the new road, the Road layer contains the attributes for the new road, and the Place layer contains the names of the new businesses. If only the Place layer were to be updated with the new businesses, the layers would no longer represent a consistent view of the world because the Topology and Road layers would be missing the new road.
To achieve consistency between layers, any update that affects multiple layers must be published together in a publication. Updating multiple partitions of a versioned layer also happens in a publication to preserve the consistency and integrity of intra-layer and inter-layer references. A new catalog version is available only when all layers have been updated and the publication has been finalized.
It is the data publisher's responsibility to ensure that a publication results in a consistent set of layers that accurately represent the world. Extensive support is provided to produce consistent content for versioned layers in the HERE platform Data Validation Library. You must have the HERE Workspace plan to use the Data Validation Library.
You can access data as it existed at different points in time by referencing the version you want. Once a version has been published, the data in that version cannot be changed and can be removed only by removing the whole catalog version. Data within a version is immutable and consistent.
The initial version of a catalog, before any data has been published to it, is -1. When data in a versioned layer is updated:
It is important to note that only those layers and partitions that are updated have their version updated to the catalog's new version number. So, the version of a layer or partition represents the catalog version in which the layer or partition was last updated.
When you request a particular version of data from a versioned layer, the partition that gets returned may have a lower version number than you requested. The following example illustrates this concept:
The red arrows show requests for data from particular versions. The dots along the line represent changes to a partition over time. The partition has been updated at catalog versions 1, 12, 24, 27, 35, 42, and 48. The current catalog version is 56.
The table below shows which partition version is returned for requests to different catalog versions.
|Requested Version||Partition Version|
A volatile layer is a key/value store where values for a given key can change and only the latest value is retrievable. As new data is published, old data is overwritten.
Volatile layers use in-memory storage. Storing data in memory helps reduce data access latency and provides applications with consistently high throughput.
Consider using a volatile layer when you don't need older versions of the data. For example, if you want to make the latest weather information available, you could use a volatile layer to store the latest observations. When new weather observations are written to the layer, the old one is overwritten so that only the latest observations are available for data consumers.
Another use for a volatile layer is as a cache for applications requiring fast response times and consistently high throughput. When running complex, time-consuming computations, it is valuable to cache the computation results for future use. Correctly caching values that are accessed frequently not only reduces the load on the rest of the components in the cloud but also helps speed up responses to other clients requesting the same data. For example, say a client application performs a complex query that requires fetching data from five different versioned layers, sorting the data, matching the data, and running statistical analysis on the data to compute an optimized parameter. If it is likely that another client will request the same query, then the result of this complex query is a good candidate for caching in a volatile layer. This way, millions of clients can benefit from fast response times when submitting the same request. At the same time, the load on versioned layers and pipelines is significantly reduced.
An index layer is part of an overall solution that enables you to index and store metadata and data in a way that is optimized for batch processing. The index layer itself is, as its name suggests, an index of the catalog’s data by attributes that you can later query. For example, if you want to run a batch process daily to find all pothole detection events recorded that day in the area surrounding a given city, you can use an index layer to index the pothole detection events by event time, event type, and location. You can then query the data every 24 hours for pothole events in the area of the city as part of your batch process. Index layers provide the flexible solution you need to easily store attributes of pothole events along with the location/time the event took place.
Like versioned layers, index layers are useful when you want to access historical data. The difference is that an index layer can be used when you do not need to maintain logical consistency across layer versions in the way versioned layers do. The other difference is that you can define your own attributes by which you want to index and query the data, whereas you cannot define your own attributes in versioned layers.
You can use an index layer in combination with a pipeline to append and get late events by event time. This ability to properly handle late events is important when your end user devices are online and offline at different times and where batches of data sent can include events with varying timestamps that need to be indexed appropriately with other events already received. Note that an index layer can only be used as input to a batch pipeline, not a stream pipeline. An index layer can be written to by either a batch or stream pipeline.
Index layers work in combination with the Data Archiving Library, which is available in the SDK. For more information about the Data Archiving Library, see:
A stream layer is a queue that streams data to data consumers in real time. Consumers read the data in the order it is added to the queue. Once a consumer reads the data, the data is no longer available to that consumer, but the data remains available to other consumers.
Stream layers can be configured with a retention time, or time-to-live (TTL) which results in unconsumed data being removed after a specified period of time.
An example use of a stream layer is to handle data from vehicle sensors.
An object store layer is a distributed and highly durable key/value store with the additional capability of listing the keys. You can upload data by referencing a key. You can overwrite that data by uploading new data to the same key. You can also delete the data using the key. This layer type enables you to access the data stored in the platform through an interface that allows to implement a file system.
- The data is mutable and parallel writes to the same key are allowed. For parallel writes, the last upload to be persisted on the server wins i.e. your code should be able to expect and handle this situation on the client side.
- Object store is not a true file system. A file system expects that the operations like delete/rename are atomic. For object store those operations will finish eventually. A file system expects that during reading from or writing to a file, the content of the file should not be changed or the file should not be deleted. The object store does not provide these guarantees. You will need to guard against this behavioural difference yourself.