XYZ

Visualizing Service Usage from Splunk Logs on a Map

By Jayson DeLancey | 06 February 2019

Like other organizations, HERE Technologies uses Splunk for log management.  These logs capture a variety of output, the details of which can be useful to keep the operations of web services running smoothly.  One such service is the HERE XYZ Hub API.  The team behind HERE XYZ decided to eat their own dog food and used the service itself to visualize on a map which map tiles were being requested during a beta phase.

I spent some time with software engineers Marvin Gilbert and Max Chrzan to understand their project and roles related to this internal tool.

AutoPlay-Small

Max explains what we're looking at:

This is a company-owned dataset that is continuously generated by any users of the HERE XYZ tools, and to be more precise XYZ Studio.  It is highly dynamic which makes it interesting for analysis.  The UI shows the base tiles of maps used in XYZ Studio that were requested during an internal testing phase to visualize hotspots.

You can learn more about HERE XYZ at https://explore.xyz.here.com but let's dig a little deeper into the project to understand what the data represents (Vector Map Tiles), the source of the data (Splunk), how data can be populated into HERE XYZ (Hub API), and finally the user interface (XYZ Splunk Visualizer) as demonstrated in the image above.

Vector Map Tiles

When storing data in HERE XYZ, the Hub API is a convenient way to fetch geospatial Features within a discrete area. A feature is composed of geometry and properties. Geometry can be things like points, lines, and polygons associated with geo-coordinate arrays. Properties are represented as a dictionary of key/value pairs that help make sense of what the feature describes. For example a label, name, resolution, author, etc. can all be properties.

The XYZ Hub stores geospatial data in a space to optimize retrieval of these features within that space given by an identifying space id, level (z), longitude (x), and latitude (y). You can see this by looking at the format of a request. A call to the endpoint might look like:

 GET https://xyz.api.here.com/hub/spaces/{spaceid}/tile/web/{z}_{x}_{y}.

If headers including the token are sent properly, the response would include a feature that could look like the following.

{
    "type": "Feature",
    "geometry": {
        "type": "Point",
        "coordinates": [
            -2.960847,
            53.430828
        ]
    },
    "properties": {
        '@ns:com:here:xyz': {
            "tags": [
                "date:2019-02-06",
                "hour:09:00"
            ]
        },
        "level": 2,
        "resolution": 10
    }
}

At the risk of recursing indefinitely, this feature is itself a description of a service request for a feature. The XYZ Hub API supports Tagging for creating indexes on important data for quicker retrieval. In this case that’s the date and time, with additional properties like the zoom level and resolution of the request.

Marvin explains:

Currently we are only tagging the date and hour of the log entry – but the tags could be also used for all kind of information which is available in the logs. Having XYZ Tags is useful for filtering the dataset or for visualizing them in the UI.

 

Splunk Logs

Serving as many as 8000+ concurrent lambda functions and 100M+ API requests like we do with HERE XYZ can be tough to understand what is going on in the AWS environment. Splunk can help with that and is the source of the data set for this application.

Max explains:

You can collect with Splunk any kind of log data. Applications and services are forwarding their log data to the Splunk server as part of our infrastructure. The Splunk server is instantly indexing the incoming dataset. It is then possible to access the data near to real-time. Splunk provides a huge toolchain to analyze the datasets. Furthermore, you can create statistics including diagrams – in total a very cool piece of software.

Splunk provides a powerful search engine that was used by Max and Marvin to narrow down their Splunk request. Not every service request is for the Hub API for example since there are other APIs, static sites, etc. being monitored with Splunk.

The request to the Splunk endpoint looks similar to the following:

var request = require('request');
var waiter = require('node_common/waiter');

const type = 'raw';
const token = 'authorization-token-for-splunk';

let search = 'search index=cpaws+role=vector-service+sourcetype=gunicorn+\".xyz\"';
let from = '-15m@m';
let to = 'now';

var options = {
    'method': 'POST',
    'url': 'api.splunk/services/search/jobs/export?output_mode=' + type,
    'headers': {
        'Authorization': 'Basic' + token,
    },
    'body': 'search=' + search
        + ' earliest_time=' + from
        + ' latest_time=' + to
}

request(options, w = waiter());
var response = w.wait();

A sample of what these logs results retrieved from Splunk look like is actually pretty straightforward. It is highly specific to our logging format so would vary depending on your project needs, but this path tells us the resolution, level, y, and x coordinates of a requested tile.

[2019-02-04 08:34:50 +0000] GET /256/all/13/4256/2992.xyz
[2019-02-04 08:34:49 +0000] GET /256/all/13/4254/2989.xyz
[2019-02-04 08:34:49 +0000] GET /256/all/13/4254/2990.xyz
[2019-02-04 08:34:50 +0000] GET /256/all/13/4258/2992.xyz
[2019-02-04 08:34:49 +0000] GET /256/all/13/4254/2992.xyz
[2019-02-04 08:24:03 +0000] GET /512/all/11/1076/675.xyz
[2019-02-04 08:24:03 +0000] GET /512/all/11/1075/676.xyz
[2019-02-04 08:34:50 +0000] GET /256/all/13/4255/2992.xyz
[2019-02-04 08:34:49 +0000] GET /256/all/13/4257/2989.xyz
[2019-02-04 08:34:50 +0000] GET /256/all/13/4257/2992.xyz
[2019-02-04 08:24:03 +0000] GET /512/all/11/1077/676.xyz

Marvin adds:

It was our first time using the Splunk REST API. The API offers nice possibilities for automated log analysis which could be interesting for a lot of developers.

You can learn more about how to export search results from Splunk from the splunk docs.

 

Hub API

Marvin recalls:

Finding the correct Regular Expression to parse the log-data took the most time as usual :D

Splunk includes some mapping functionality known as Geostats but it didn’t quite meet their needs. It couldn’t handle the volume of transactions as being generated by HERE XYZ but also is limited to latitude and longitude. In cases with the log data Marvin noted the need “to support every kind of geo-references (level, row, col, quadkey, morton code, …).”

To make this data loader pipeline Marvin described:

To provide and store the dynamic content to XYZ Hub, a periodically running pipeline was generated and triggered every 5 minutes by AWS CloudWatch Events. This way it was kept fresh with a live dataset in sync with XYZ Hub. To transform the log events… a small node script was deployed as an AWS Lambda.

The key is using the Spaces API to add feature collections like the Feature described earlier:

var options = {
    'method': 'PUT',
    'url': 'https://xyz.api.here.com/hub/spaces/' + spaceId + '/features',
    'headers': {
        'Authorization': 'Bearer ' + token,
        'Content-Type': 'application/geo+json'
    },
    'body': JSON.stringify(featureCollection)
};

request(options, w = waiter());
var response = w.wait();

You can learn a bit more about the Edit Features endpoint which can be used to create or replace features, add tags, remove tags, etc

 

XYZ Splunk Viewer

Putting it all together, the final component is the user interface itself. Since vector rendering is Max’s specialty, he points out:

The XYZ-Splunk Viewer, loads the data from XYZ Hub and visualizes it to the user. It provides the ability to select search parameters as date and time to select certain timeframes.

For the actual UI, the viewer uses Tangram and Leaflet to visualize the XYZ spaces with the tagged data. To make the site available to everyone it is simply hosted on AWS S3.

Screen Shot 2019-01-31 at 7.46.20 PM

Max also noted:

We used the datepicker library from jquery-ui (https://jqueryui.com/datepicker/) which is how we quickly implemented a simple but nice calendar UI.

For a more detailed example of building a UI with Tangram and HERE XYZ you might want to check out the Green Amsterdam tutorial over at https://codelabs.here.xyz.  If you want to look at the complete source listings for this project you can find more detail in the repo - https://github.com/MarvGilb/xyz-hackweeek.

 

Max and Marvin are both back-end developers and colleagues at the Frankfurt (Germany) office and have known each other for over 15 years.  Marvin works with microservices deployed in the AWS environment including AWS EC2, Lambda, CFN, MongoDB, Amazon Aurora, and Splunk.  Max primarily works on services for rendering that are community and customer focused, including the AWS portfolio EC2, ECS, SQS, SNS, RDS, Cloudfront, Cloudformation, S3, etc. and OGC compliant open-source projects.