Running Spatial drivers in Databricks Runtime - Spectrum_Location_Intelligence_for_Big_Data - 5.2.1

Location Intelligence SDK for Big Data Guide

Product type
Software
Portfolio
Locate
Product family
Spectrum
Product
Spatial Big Data > Location Intelligence SDK for Big Data
Version
5.2.1
Language
English
Product name
Location Intelligence for Big Data
Title
Location Intelligence SDK for Big Data Guide
Copyright
2024
First publish date
2015
Last updated
2024-10-16
Published on
2024-10-16T13:55:01.634374

Step 1: Create Cluster

Use or create a new cluster as per your requirements. We support spark versions starting from 3.0+ to 3.5.2. So, you can choose any version among the supported versions. We recommend using a cluster with minimum of 32 GB memory and 4 cores.

Add following configurations to advanced section of spark configurations:

Spark Configurations

If you are using to Download Manager utility and in case your data is located on any remote location for example on Amazon S3 then you also need to provide credential information as environment variables.

Step 2: Install SDK jar

Install the following jar as a library in cluster.

location_intelligence_bigdata_li_sdk_spark3_2_12_<version>.jar

For more information about how to install libraries, see Libraries.

Step 3: Creating Job

  • Create a new job under ‘Workflows’ section in databricks UI.
  • Under ‘Type’, select Spark Submit option.
  • Select cluster name that you just created under ‘Cluster’ dropdown.
  • Add location_intelligence_bigdata_li_sdk_spark3_2_12_<version>.jar under dependent libraries.
  • Main class values for different drivers are as following:
    • Hexgen Generation: com.precisely.bigdata.li.spark.app.hexgen.HexGenDriver
    • Join By Distance: com.precisely.bigdata.li.spark.app.joinbydistance.JoinByDistanceDriver
    • Point In Polygon: com.precisely.bigdata.li.spark.app.pointinpolygon.PointInPolygonDriver
    • Search Nearest: com.precisely.bigdata.li.spark.app.searchnearest.SearchNearestDriver
  • Provide Parameters with which you want to run the job.

    Example:

    ["--input","/FileStore/lisdk/perf/address-fabric/address_fabric_usa.txt",
    "--input-format","csv",
    "--csv","header=true",
    "--csv","delimiter=\\t",
    "--table-file-type","TAB",
    "--table-file-path","/dbfs/FileStore/lisdk/perf/buildings/tab/data",
    "--table-file-name","buildings_usa.tab",
    "--latitude-column-name","LAT",
    "--longitude-column-name","LON",
    "--output","/dbfs/FileStore/lisdk/perfoutput4",
    "--output-fields","ELEVATION",
    "--output-fields","ELEVHIGH",
    "--overwrite"]
  • Now job is created. You can run this job once your cluster is up and running.