Consuming Results - 5.2.1

Location Intelligence SDK for Big Data Guide

Product type
Software
Portfolio
Locate
Product family
Spectrum
Product
Spatial Big Data > Location Intelligence SDK for Big Data
Version
5.2.1
Language
English
Product name
Location Intelligence for Big Data
Title
Location Intelligence SDK for Big Data Guide
Copyright
2024
First publish date
2015
Last updated
2024-10-16
Published on
2024-10-16T13:55:01.634374
Note: These examples assume your reference data was distributed using HDFS; however, S3 is also an option.

There are multiple ways to consume the data produced by a Spark job, MapReduce job, or Hive query.

To verify whether or not the output exists, use the following command. This command lists the output files:

hadoop fs -ls /dir/on/hdfs/output

To display the size of files and directories contained in the given output directory, use the following command:

hadoop fs -du /dir/on/hdfs/output

To check the file, use the following command:

hadoop fs -cat dir/on/hdfs/output/part-r-00000 | more

To check initial kilobytes of the file, use the following command:

hadoop fs -cat dir/on/hdfs/output/part-r-00000 | head

To check last kilobytes of the file, use the following command:

hadoop fs -tail dir/on/hdfs/output/part-r-00000

The syntax supports Unix -f option, that enables the specified file to be monitored. As new lines are added to the file by some another process, the tail updates the display.

To copy the output from HDFS to Linux file system, use the following commands:

mkdir /pb/spectrum-bigdata-geocoding/out
hadoop fs -copyToLocal dir/on/hdfs/output/* /pb/spectrum-bigdata-geocoding/out

To concatenate the output files from HDFS directory to a file in the Linux file system, use the following command:

hadoop fs -getmerge dir/on/hdfs/output/* /pb/spectrum-bigdata-geocoding/out/merged_output.txt addnl
Note: addnl is optional and can be set to enable adding a new line character at the end of each file.

To copy the output from HDFS to some other location in HDFS, use the following command:

hadoop fs -cp dir/on/hdfs/output/* /dir/on/hdfs/copy_of_output

To copy the output recursively from one HDFS to some other HDFS, use the following command:

hadoop fs -distcp <path_of_output_dir_on_hdfs> <dir_path_on_other_hdfs>

To copy the output from HDFS using Hive, use the following command:

hive> CREATE EXTERNAL TABLE hexbin(id string, wkt string, 
long double, lat double) ROW FORMAT DELIMITED FIELDS 
TERMINATED BY "\t" LINES TERMINATED BY "\n" STORED AS 
TEXTFILE LOCATION '/dir/on/hdfs/hive_output'

Now, the output can be deployed from Linux file system for further processing.

For example, if the output contains WKT, it can be imported into a database or product that supports importing spatial objects such as WKT (such as PostGIS, FME, or SAP HANA).

Alternatively, a plug-in for MapInfo Professional can be used to pull data directly from HDFS into a native table. Refer to WKT2MapInfo for more information.