This example illustrates how to connect to an external database that uses drivers which are not shipped with Data360 Analyze.
See also Acquiring data from MS Access for a simple example of how to connect to a database that uses a driver which is shipped with the application.
- Prior to working with the JDBC Query node, you first need to downloaded all required JDBC drivers from the database vendor.
- Install the third party drivers that you downloaded in step 1. We recommend that you install the driver files in the following location, ensuring that drivers for different databases are stored in separate sub-directories:
<Data360Analyze site configuration directory>/site-<port>/lib/java/db/<driverName>
For example:
<Data360Analyze site configuration directory>/site-<port>/lib/java/Cloudera/Cloudera_HiveJDBC41
Refer to the vendor's driver installation guide for details of the required JAR files for your use case.
- OpenData360 Analyze, then from the Directory select Create > Data Flow.
- In the Nodes panel, search for the JDBC Query node and drag it onto the canvas.
- Select the JDBC Query node, then from the Properties panel, enter a SQL statement in the SqlQuery property to query your database. For example:
SHOW DATABASES
- Expand the Connection property group, then enter a username and password in the DbUser and DbPassword properties. These values may be required depending on the cluster's configuration.
- Expand the Advanced property group and configure the following properties (please refer to the Cloudera Hive driver installation guide for further details of the configuration options):
- DbUrl - Enter the database connection URL. This is the connection details for the host in the cluster providing the hive2 service e.g.
jdbc:hive2://hadoop5:10000
- DbDriver - Enter the class name of the JDBC driver (case sensitive) e.g.
com.cloudera.hive.jdbc41.HS2Driver
- DbOptions - If you want to specify any additional JDBC connection properties, you can list them as key-value pairs in the following format:
key=val&key=val2
e.g.AuthMech=3;transportMode=binary
. - DbDriverClasspath - Specify the full path to the directory that contains your JDBC driver. If you have installed your driver into the recommended
<Data360Analyze site configuration directory>/site-<port>/lib/java/<driverName>
location, it is recommended to reference the site-<port> path using the substitution syntax:%ls.appDataDir%/lib/java/<driverName>
for example:%ls.appDataDir%\lib\java\Data3SixtyAnalyze_External
If you want to reference multiple directories, list the directories using a semi colon to separate each entry. For example:
%ls.appDataDir%\lib\java\Hive_drivers1;C:\ProgramData\Data360Analyze\site\lib\java\Hive_drivers2
- DbUrl - Enter the database connection URL. This is the connection details for the host in the cluster providing the hive2 service e.g.
- Run the JDBC Query node to import data from your Cloudera Hadoop cluster.
Creating a JDBC Query library node
If you will be connecting to a particular database on a regular basis, you can create a library node to save you having to re-enter the connection details each time you want to connect.
- From the Directory, select Create > Library Node.
- From the Create Library Node dialog, select JDBC Query as the node to inherit from, then click Create.The Library Node Editor opens.
- On the Configure tab, enter the database connection details in the DbUrl, DbDriver and DbOptions properties as described in step 7 above.CAUTION:From a security perspective, it is recommended that you do not include any user credentials in the library node, as doing so would enable any user who has access to the node to use the credentials.Note: Given that the driver installation directory may be different on different machines, it is not recommended to enter an absolute path in the DbDriverClasspath property of the library node. If the drivers are installed in the same relative location on multiple machines, one option is to parameterize the classpath, for example, on a desktop installation you could point to a directory within the
%localappdata%\Data360Analyze
folder structure. - Save your new custom node. When you want to use an instance of this library node, you can search for it from the Nodes panel in the Designer, for example:
You will need to configure the SqlQuery property each time you use the node, but you will not need to reconfigure any of the connection properties that you have pre-configured in your library node.