HERE Data SDK for Python with Spark

Data SDK for Python with Spark is a tool for Data Scientists who want to use Spark for large scale analysis of HERE platform data or who know they want to implement their solution in a pipeline and would prefer to use the same language/framework for analysis to simplify deployment to production.

The SDK uses the Sparkmagic extension for Jupyter in order to run spark jobs on Spark, either in local mode or in cluster mode:

Sparkmagic Architecture, taken from Sparkmagic documentation
Figure 1. Sparkmagic Architecture, taken from Sparkmagic documentation

You can find the steps to install and configure the Data SDK for Python with Spark here.

Prerequisites

Installation

Data SDK for Python with Local Spark works only for Linux/MacOS. EMR Spark Cluster option is available for all platforms. Please follow these steps to configure the SDK:

  1. Install and configure the Sparkmagic Extension.
  2. Choose the Livy Server Deployment that you need:
    • Local Spark (Only for Linux/MacOS): See how to install and deploy Hadoop, Spark and Livy locally. This is the simplest and default option.
    • EMR Spark Cluster (Optional): See how to deploy and connect to a remote EMR cluster if you want to run your jobs in there.

results matching ""

    No results matching ""