Running Spatial drivers in AWS EMR - Spectrum_Location_Intelligence_for_Big_Data - 5.2.1

Location Intelligence SDK for Big Data Guide

Product type
Software
Portfolio
Locate
Product family
Spectrum
Product
Spatial Big Data > Location Intelligence SDK for Big Data
Version
5.2.1
Language
English
Product name
Location Intelligence for Big Data
Title
Location Intelligence SDK for Big Data Guide
Copyright
2024
First publish date
2015
Last updated
2024-10-16
Published on
2024-10-16T13:55:01.634374

Step 1: Create Cluster

Create a new cluster as per your requirements. We support EMR versions starting from emr-6.2.0+. You can choose any version among the supported versions. We recommend using a cluster with minimum of 32 GB memory and 4 cores.

Step 2: Install SDK jar

First, you need to upload the distribution zip to an AWS S3 bucket. After your cluster has started and is in waiting state, you can copy the distribution zip from S3 to primary node of the cluster and extract it. Copy all folders inside the zip to hdfs path of your choice. We will need this path later to configure our spark session.

Step 3: Run spark-submit command for drivers with options as per your requirement

Note: If input data is not present on your cluster, you can push all your input data such as csv files, tab files to some S3 bucket and then use Download Manager utility. It will download the data to worker nodes when performing the operation.

Example:

spark-submit 
--class com.precisely.bigdata.li.spark.app.pointinpolygon.PointInPolygonDriver 
--master local hdfs:///<hdfs-path-to-extarcted-zip>/spark3/driver/location-intelligence-bigdata-spark3drivers_2.12-0-SNAPSHOT-all.jar 
--input s3a://<s3-uri-for-input-data> 
--input-format=csv 
--csv header=true delimiter=',' 
--table-file-type TAB 
--table-file-path s3a://<s3-uri-of-polygon-data> 
--table-file-name USZIPBDY.TAB 
--latitude-column-name lat 
--longitude-column-name lon 
--output /home/hadoop/testfolder/output 
--output-fields ZIP, Name 
--include-empty-search-results 
--overwrite 
--download-location /home/hadoop/testfolder/downloadLocation

This example uses --download-location parameter to specify the location where you want to download the data. Make sure that the user has write permission on the download location directory.