- Connectivity to Hive from Spectrum on Windows
- Support and connectivity to Hive version 2.1.1 from Spectrum with high availability
- Support to Read and Write from Hive DB (JDBC) via Model Store connection
Also see Best Practices for connecting to HDFS 3.x and Hive 2.1.1.
To be able to use the Read from DB stage, you need to create connection to the respective database using the Connection Manager of Spectrum Management Console. For details, see Database Connection Manager in Write to DB.
General Tab
Field Name |
Description |
---|---|
Connection |
Select the database connection you want to use. Your choices vary depending on what connections are defined in the Connection Manager of Spectrum Management Console. If you need to make a new database connection, or modify or delete an existing database connection, click Manage Connections. If you are adding or modifying a database
connection, complete these fields:
|
SQL |
Enter the SQL query to specify the records that
need to be read from the data source on running the dataflow. You
can manually type the SQL query in this field. Alternatively, use
the Visual Query Builder to construct the query by clicking
Build SQL. The SQL query can include variables instead of actual column names. Using variables allows you to customize the query at runtime. For more information, see Query Variables. A sample query
for exposing BranchID and BranchType
from public.Branch table in the database can be:
Note: For
Integer type fields values can be entered
without quotes but for String type it should be
in single quotes.
|
Build SQL | Create a complex query by selecting multiple
columns, and creating joins and nested queries by clicking
Build SQL. The Visual Query
Builder opens. For more information, see Visual Query Builder. Note: A query created using the Visual Query
Builder is displayed with fully qualified names of columns and
tables in the SQL field.
|
Regenerate Fields | To see the schema of the data to be fetched by the
query, click Regenerate Fields. If you edit an existing query, click Regenerate Fields to fetch the modified schema. Note: On
clicking Regenerate Fields, the entity
names in the SQL query are retained and not replaced with their
fully qualified names.
|
Preview | To see a sample of the records fetched by the SQL query, click Preview. |
date
datatype as String
values. This is the
behavior of the jTDS driver, which is the default driver used by
Spectrum. To handle all date
datatype values as is, use Microsoft's
JDBC driver.Runtime Tab
Field name | Description |
---|---|
Fetch size |
Select this option to specify the number of records to read from the database table at a time. For example, if the Fetch size value is 100 and total number of records to be read is 1000, then it would take 10 trips to the database to read all the records. Setting an optimum fetch size
can improve performance significantly.
Note: You can calculate an
optimum fetch size for your environment by testing the execution
times between a Read from DB stage and a Write to Null stage.
For more information, see Determining an Optimimum Fetch Size.
|
Stage Options | This section lists the dataflow options
used in the SQL query of this stage and allows you to provide a
default value for all these options. The Name
column lists the options while you can enter the default values in
the corresponding Value column.
Note: The
default value provided here is also displayed in the
Map dataflow options to stages
section of the Dataflow Options dialog
box. The dialogue box also allows you to change the default
value. In case of a clash of default values provided for an
option through Stage Options,
Dataflow Options, and Job
Executor the order of precedence is: Value
provided through Job Executor > Value
defined through the Dataflow Options
dialogue box > Value entered through the Stage
Options.
|