Hive Setup - Spectrum_Routing_for_Big_Data - 5.1

Spectrum Routing Installation: Cloudera

Product type
Software
Portfolio
Locate
Product family
Spectrum
Product
Spatial Big Data > Routing for Big Data
Version
5.1
Language
English
Product name
Spectrum Routing for Big Data
Title
Spectrum Routing Installation: Cloudera
Copyright
2024
First publish date
2017
Last updated
2024-10-18
Published on
2024-10-18T09:54:26.541418

To set up routing for Hive, perform the following steps from the same node used in Installing the SDK:

  1. Copy the Hive routing jar to the HiveServer node:
    /precisely/routing/software/hive/lib/spectrum-bigdata-routing-hive-version.jar
  2. In Cloudera Manager, navigate to the Hive Configuration page. Search for the Hive Auxiliary JARs Directory setting. If the value is already set then move the Hive routing jar into the specified folder. If the value is not set then set it to the parent folder of the Hive routing jar.
    /precisely/routing/software/hive/lib/
  3. Restart all Hive services.
  4. Launch Beeline, or some other Hive client, for the remaining steps:
    beeline -u jdbc:hive2://localhost:10000/default -n sdkuser
  5. Create the Routing functions and set the Hive variables. Add the temporary keyword after create if you want a temporary function (this step would need to be redone for every new Hive session). For more information, see Hive Variables.
    create function PointToPointRoute as 'com.pb.bigdata.spatial.routing.hive.PointToPointRoute';
    create function IsoChrone as 'com.pb.bigdata.spatial.routing.hive.IsoChrone';
    create function IsoDistance as 'com.pb.bigdata.spatial.routing.hive.IsoDistance';
    set pb.routing.config.location=hdfs:///precisely/routing/software/resources/config/dbList.json;
    set pb.routing.allowFallback=true;
    set pb.download.location = /precisely/downloads;
    set pb.download.group= dm_users;
  6. Test the Routing function.
    SELECT IsoDistance(-77.088217, 38.937072, 3, map('distanceUnit', 'km'));
    Note:
    • The first time you run the job may take a while if the reference data has to be downloaded remotely from HDFS or S3. It may also time out when using a large number of datasets that are stored in remote locations such as HDFS or S3. If you are using Hive with the MapReduce engine, you can adjust the value of the mapreduce.task.timeout property.
    • Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node. The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints. To process these queries, we recommend increasing the amount of memory available to the HiveServer2 process (for example, by setting HADOOP_HEAPSIZE in hive-env.sh).
    This query should return a polygon geometry comprising all the points that lie at the specified distance from the starting point.