Connect to a Cloudera Hadoop cluster - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

This example illustrates how to connect to an external database that uses drivers which are not shipped with Data360 Analyze.

See also Acquiring data from MS Access for a simple example of how to connect to a database that uses a driver which is shipped with the application.

  1. Prior to working with the JDBC Query node, you first need to downloaded all required JDBC drivers from the database vendor.
  2. Install the third party drivers that you downloaded in step 1. We recommend that you install the driver files in the following location, ensuring that drivers for different databases are stored in separate sub-directories:

    <Data360Analyze site configuration directory>/site-<port>/lib/java/db/<driverName>

    For example: <Data360Analyze site configuration directory>/site-<port>/lib/java/Cloudera/Cloudera_HiveJDBC41

    Refer to the vendor's driver installation guide for details of the required JAR files for your use case.

  3. OpenData360 Analyze, then from the Directory select Create > Data Flow.
  4. In the Nodes panel, search for the JDBC Query node and drag it onto the canvas.
  5. Select the JDBC Query node, then from the Properties panel, enter a SQL statement in the SqlQuery property to query your database. For example:

    SHOW DATABASES

  6. Expand the Connection property group, then enter a username and password in the DbUser and DbPassword properties. These values may be required depending on the cluster's configuration.
  7. Expand the Advanced property group and configure the following properties (please refer to the Cloudera Hive driver installation guide for further details of the configuration options):
    1. DbUrl - Enter the database connection URL. This is the connection details for the host in the cluster providing the hive2 service e.g. jdbc:hive2://hadoop5:10000
    2. DbDriver - Enter the class name of the JDBC driver (case sensitive) e.g. com.cloudera.hive.jdbc41.HS2Driver
    3. DbOptions - If you want to specify any additional JDBC connection properties, you can list them as key-value pairs in the following format: key=val&key=val2 e.g. AuthMech=3;transportMode=binary.
    4. DbDriverClasspath - Specify the full path to the directory that contains your JDBC driver. If you have installed your driver into the recommended <Data360Analyze site configuration directory>/site-<port>/lib/java/<driverName> location, it is recommended to reference the site-<port> path using the substitution syntax: %ls.appDataDir%/lib/java/<driverName> for example: %ls.appDataDir%\lib\java\Data3SixtyAnalyze_External

      If you want to reference multiple directories, list the directories using a semi colon to separate each entry. For example:

      %ls.appDataDir%\lib\java\Hive_drivers1;C:\ProgramData\Data360Analyze\site\lib\java\Hive_drivers2
  8. Run the JDBC Query node to import data from your Cloudera Hadoop cluster.

Creating a JDBC Query library node

If you will be connecting to a particular database on a regular basis, you can create a library node to save you having to re-enter the connection details each time you want to connect.

  1. From the Directory, select Create > Library Node.
  2. From the Create Library Node dialog, select JDBC Query as the node to inherit from, then click Create.The Library Node Editor opens.
  3. On the Configure tab, enter the database connection details in the DbUrl, DbDriver and DbOptions properties as described in step 7 above.
    CAUTION:
    From a security perspective, it is recommended that you do not include any user credentials in the library node, as doing so would enable any user who has access to the node to use the credentials.
    Note: Given that the driver installation directory may be different on different machines, it is not recommended to enter an absolute path in the DbDriverClasspath property of the library node. If the drivers are installed in the same relative location on multiple machines, one option is to parameterize the classpath, for example, on a desktop installation you could point to a directory within the %localappdata%\Data360Analyze folder structure.
  4. Save your new custom node. When you want to use an instance of this library node, you can search for it from the Nodes panel in the Designer, for example:

    You will need to configure the SqlQuery property each time you use the node, but you will not need to reconfigure any of the connection properties that you have pre-configured in your library node.