Enriching SageMaker Preprocessing With Location Data

By Michael Palermo | 10 May 2019

Today we are happy to announce our latest tutorial - Fetching Location Data With SageMaker. SageMaker is Amazon's fully managed machine learning service. Data scientists and developers can use it to build and train machine learning models and integrate into applications. The following diagram is a typical machine learning workflow process:


The first step in generating example data is labeled "Fetch", where initial gathering of data occurs. Whether the data is in-house or publicly available, it is typically merged into a single repository before moving on to "Clean" and/or "Prepare" step(s). In our new tutorial, we guide you through the process of getting started with SageMaker and factoring location data in the generated example data.

NOTE: The tutorial is not an entire walk-through of the machine learning workflow - only the work done in "Fetch" is considered.

Scope and Requirements

The following are prerequisites for getting started:

Overview of Tasks

The following topics are covered or referenced in the tutorial:

Setup. Creating a SageMaker Notebook Instance

Architecture. Understanding architecture and review of initial "incidents" dataset

Development. Enrich "incidents" dataset to include nearby points of interest using HERE Places API

The simple output of the tutorial is a unified dataset ready for the next step in the machine learning data pipeline. The following screen capture demonstrates the output within the Jupyter environment:



If you need to factor location data into your machine learning example data, we invite you to explore the Fetching Location Data With SageMaker tutorial. For more information about HERE, please visit our developer portal at

If you want to see HERE & SageMaker work together in real time, join us on May 15th for our AWS Connections Livestream on Twitch.