Shapefiles are a proprietary but common geospatial file format developed by ESRI. It is frequently used by governments to store geospatial data.
Many shapefiles can be easily uploaded into a Data Hub Space. Some shapefiles require extra processing steps before you can bring them into the Data Hub.
In this tutorial, we’ll cover what you need to do to successfully import shapefiles, along with the special steps needed and some open source tools needed for those trickier ones.
Unlike a GeoJSON file, a shapefile is made up of a number of separate files. Shapefiles on the internet are usually zipped, but once uncompressed you will see a number of files with the same name but different extensions. Some of the more important ones are:
.shp - contains the geometries of the features (points, lines, polygons)
.dbf - contains the attributes for the features
.prj - contains information aboute the projection and coordinate reference system (CRS)
If the shapefile uses lat/lon coordinates and the WGS84 projection, and is under 200MB, you should be able to upload it using the Data Hub CLI.
In the terminal, cd to the shapefile directory, and type
here xyz upload space_id -f my_shapefile.shp
The CLI will look for my_shapefile.dbf and other files in the specified directory. (If it is missing, no attributes of the geometries will be imported.)
Note that you can use -a to select attributes of features to convert into tags, which will let you filter features server-side when you access the Data Hub API.
Advanced shapefile upload
Duration is 5 min
Shapefiles are an infinitely variable format, and there will be cases where you may need to manipulate or modify the data in order to import it into your Data Hub space. You can do this with other open-source geospatial tools, specifically mapshaper and QGIS.
mapshaper
mapshaper is a command-line tool for editing and manipulating geospatial data in a variety of common formats.
Note that mapshaper can modify shapefiles directly, or convert shapefiles into GeoJSON. Converting to GeoJSON will give you more options and faster uploads when bringing the data into HERE Studio. The mapshaper documentation provides a wide variety of options, but a simple conversion command is:
mapshaper my_geodata.shp -o my_geodata.geojson
(Note that you can also specify -o format=geojson but mapshaper will also attempt use the extension of the output filename to determine the format.)
Data Hub QGIS plugin provided by HERE
Duration is 10 min
QGIS is an open source desktop GIS tool that lets you edit, visualize, manage, analyse and convert geospatial data. You can upload and download data from your Data Hub spaces using the Data Hub QGIS plugin. (The plugin is also available on Github.)
You can install the Data Hub QGIS plugin from within QGIS Plugin search tool if you have the “show experimental plugins” option checked in the plugin console settings.
experimental
You can easily open almost any shapefile in QGIS, at which point you can save it to your Data Hub spaces using the Data Hub QGIS plugin, or export it as GeoJSON to the desktop to use the Data Hub CLI streaming upload options.
Large individual features
Duration is 10 min
Some shapefiles may contain very large and extremely detailed individual lines or polygons. If a single feature is greater than 10-20MB, you may see 400 or 413 http errors when you try to upload the shapefile. In many cases, this level of detail is unnecessary for web mapping. If so, you can try to simplify the feature using mapshaper or QGIS. You may also want to adjust Data Hub CLI upload parameters so less data is sent in each API request.
Adjusting ‘chunk’ parameters
In order to optimize upload speed, the CLI “chunks” features together and then sends the chunk to the CLI. There are typically 200-400 features per chunk. While a large feature may be small enough to be uploaded, when combined with other features, it may be too large for the API.
You can adjust the chunk size using -c – in this example, the CLI will upload 100 features per API request:
here xyz upload spaceID -f large_features.shapefile -c 100
Depending on the size of the feature, you may want to try c -10 (ten per request) or c -1 (one at a time).
mapshaper
You can simplify lines and polygons in shapefiles using -simplify.
In this case, you must specify the output format as format=geojson as there is no filename extension for mapshaper to reference. The - enables stout.
QGIS
open the shapefile in QGIS
choose Vector -> Geometry Tools -> Simplify
save the simplified data to a new Data Hub space using the Data Hub plugin
Note that the Simplify tool works in decimal degrees, and the default is 1 degree, which is probably not what you want. Useful values depend on the extent and zoom levels of your map, but 0.01, 0.001 and 0.0001 are interesting values.
Very large shapefiles (> 200MB)
Duration is 10 min
The Data Hub CLI will attempt to load the entire shapefile into memory before uploading it to the API. This will generally work for shapefiles up to 200-300MB, but you will start to see Node.js memory errors for shapefiles larger than that.
While GeoJSON and CSVs can be streamed via the upload -s option, this option is not yet available for shapefiles. You will have the most success converting the shapefile to GeoJSON and then uploading to HERE Studio.
Note that -a is not available when -s is used, but you can still specify properties to convert into tags using -p.
You can also open the very large shapefile in QGIS and save directly to a Data Hub space using the Data Hub QGIS plugin, though this will be slower than using the CLI streaming feature.
Projections and CRS (Coordinate Reference Systems)
Duration is 10 min
Just like standards, the beauty of projections is there are so many to choose from. GeoJSON expects points to be projected in Web Mercator (WGS84/EPSG:4326). Many shapefiles are in different projections, or use local projections without lat/lon coordinates (i.e. state plane). Fortunately, it is easy to get mapshaper to convert into GeoJSON-friendly coordinates.