To set up routing for Hive, perform the following steps from the same node used in Installing the SDK:
-
On the HiveServer2 node, create the Hive auxlib folder if one does not already exist.
sudo mkdir /usr/hdp/current/hive-server2/auxlib/
-
Copy the Hive routing jar to the folder on the HiveServer2 node created in step
1:
sudo cp /precisely/routing/software/hive/lib/spectrum-bigdata-routing-hive-version.jar /usr/hdp/current/hive-server2/auxlib/
- Restart all Hive services.
-
Launch Beeline, or some other Hive client, for the remaining steps:
beeline -u jdbc:hive2://localhost:10000/default -n sdkuser
-
Create the Routing functions and set the Hive variables. Add the
temporary keyword after
create
if you want a temporary function (this step would need to be redone for every new Hive session). For more information, see Hive Variables.create function PointToPointRoute as 'com.pb.bigdata.spatial.routing.hive.PointToPointRoute'; create function IsoChrone as 'com.pb.bigdata.spatial.routing.hive.IsoChrone'; create function IsoDistance as 'com.pb.bigdata.spatial.routing.hive.IsoDistance'; set pb.routing.config.location=hdfs:///precisely/routing/software/resources/config/dbList.json; set pb.routing.allowFallback=true; set pb.download.location = /precisely/downloads; set pb.download.group= dm_users;
-
Test the Routing function.
SELECT IsoDistance(-77.088217, 38.937072, 3, map('distanceUnit', 'km'));
Note:- The first time you run the job may take a while if the reference data has to be
downloaded remotely from HDFS or S3. It may also time out when using a large number
of datasets that are stored in remote locations such as HDFS or S3. If you are using
Hive with the MapReduce engine, you can adjust the value of the
mapreduce.task.timeout
property. - Some types of queries will cause Hive to evaluate UDFs in the HiveServer2 process space instead of on a data node. The Routing UDFs in particular use a significant amount of memory and can shut down the Hive server due to memory constraints. To process these queries, we recommend increasing the amount of memory available to the HiveServer2 process (for example, by setting HADOOP_HEAPSIZE in hive-env.sh).
This query should return a polygon geometry comprising all the points that lie at the specified distance from the starting point. - The first time you run the job may take a while if the reference data has to be
downloaded remotely from HDFS or S3. It may also time out when using a large number
of datasets that are stored in remote locations such as HDFS or S3. If you are using
Hive with the MapReduce engine, you can adjust the value of the