Hexagon Generator - Spectrum_Location_Intelligence_for_Big_Data - 5.2.1

Location Intelligence SDK for Big Data Guide

Product type
Software
Portfolio
Locate
Product family
Spectrum
Product
Spatial Big Data > Location Intelligence SDK for Big Data
Version
5.2.1
Language
English
Product name
Location Intelligence for Big Data
Title
Location Intelligence SDK for Big Data Guide
Copyright
2024
First publish date
2015
Last updated
2024-10-16
Published on
2024-10-16T13:55:01.634374

This Spark job generates the hexagons within a bounding box (for example, the bounding box of the continental USA). Hexagon output can be used for map display.

To create hexagons for a given bounding box:
  • Modify the command options according to the hexagons to be generated. Change the bounding box coordinates and hexagon level to suit your needs. See Hexagons to learn about hexagon levels.
  • Deploy jar and configuration to the Hadoop cluster.
  • Start the Spark job using the following command:
    spark-submit 
    --class com.precisely.bigdata.li.spark.app.hexgen.HexGenDriver 
    --master local <dir-on-server>\location-intelligence-bigdata-spark3drivers_2.12-0-SNAPSHOT-all.jar 
    --output <output-path> 
    --min-longitude -73.728200 
    --min-latitude 40.979800 
    --max-longitude -71.787480 
    --max-latitude 42.050496 
    --hex-level 3 
    --container-level 2 
    --number-of-partitions 1 
    --max-number-of-rows 5 
    --csv header=true 
    --overwrite

The output of the HexGenerator is a list of WKT that represents the hexagons. See Consuming Results for how to use the output.

Sample Output

Executing the Job

To run the Spark job, you must use the spark-submit script in Spark’s bin directory. Make sure to use the appropriate Spark3 jar for your installed distribution of Spark and Scala.

DriverClass:

com.precisely.bigdata.li.spark.app.hexgen.HexGenDriver

Scala2.12:

/precisely/li/software/spark3/sdk/lib/location-intelligence-bigdata-spark3drivers_2.12-sdk_version-all.jar

For Example:

spark-submit
--class com.precisely.bigdata.li.spark.app.hexgen.HexGenDriver
--master local
C:\python\jars\location-intelligence-bigdata-spark3drivers_2.12-sdk_version-all.jar
--output /user/sdkuser/hexgen_output
--min-longitude -73.728200
--max-longitude -71.787480
--min-latitude 40.979800
--max-latitude 42.050496
--hex-level 3
--container-level 2
--number-of-partitions 5
--max-number-of-rows 100
--csv header=true
--overwrite

Job Parameters

All parameters are declared with a double dash. The required fields are in bold.
Parameter Description Example
--min-longitude Minimum longitude value of the bounding box for which you want to generate hexagons. --min-longitude

-73.728200

--max-longitude Maximum longitude value of the bounding box for which you want to generate hexagons. --max-longitude

-71.787480

--min-latitude Minimum latitude value of the bounding box for which you want to generate hexagons. --min-latitude

40.979800

--max-latitude Maximum latitude value of the bounding box for which you want to generate hexagons. --max-latitude

42.050496

--hex-level The level to generate hexagons for. Must be between 1 and 11. --hex-level 3
--container-level A hint for providing some parallel hexagon generation. Must be less than the hex-level property. --container-level 2
--number-of-partitions Number of partitions. --number-of-partitions 5
--max-number-of-rows Maximum number of rows per partition. --max-number-of-rows 100
--output The location of the directory for the output. --output /user/sdkuser/hexgen_output
--output-format The output format. Valid values: csv or parquet. If not specified, the default is csv. --output-format=csv
--csv Specify the options to be used when reading and writing CSV input and output files.

Common options and their default values:

  • delimiter:,
  • quote:"
  • escape:\
  • header:false
  • Specify individual options:

    --csv header=true

    --csv delimiter='\t'

  • Specify multiple options:

    --csv header=true delimiter='\t'

--parquet Specify the options to be used when reading and writing parquet input and output files. --parquet compression=gzip
--overwrite Including this parameter will tell the job to overwrite the output directory. Otherwise, the job will fail if this directory already has content. This parameter does not have a value. --overwrite