Connect to Hive with Kerberos - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

This example illustrates how to connect to an external database that uses drivers which are not shipped with Data360 Analyze.

See also Acquiring data from MS Access for a simple example of how to connect to a database that uses a driver which is shipped with the application.

Note: The steps in this topic have been tested with the Database Metadata node and the JDBC Query node.
  1. Downloaded all required JDBC drivers for your Hadoop installation from the database vendor.
  2. Install the third party drivers that you downloaded in step 1. We recommend that you install the driver files in the following location, ensuring that drivers for different databases are stored in separate sub-directories:

    <Data360Analyze site configuration directory>/site-<port>/lib/java/db/<driverName>

    For example: <Data360Analyze site configuration directory>/site-<port>/lib/java/Cloudera/Cloudera_HiveJDBC42

  3. Obtain the keytab file and the krb5.conf file for your setup from your Kerberos administrator. Place these files on the Data360 Analyze server and note the location for the next step.
  4. Create a file named gss-jaas.conf

    In this file, include the following information, replacing the principal and keyTab settings with actual values:

    Hortonworks example

    com.sun.security.jgss.initiate {
     com.sun.security.auth.module.Krb5LoginModule required
     useKeyTab=true
     useTicketCache=true
     principal="yourprincipaluser@yourcompany.com"
     keyTab="/location/to/your.keytab"
     debug=false;
    };

    Cloudera example

    Client {
     com.sun.security.auth.module.Krb5LoginModule required
     useKeyTab=true
     useTicketCache=true
     principal="yourprincipaluser@yourcompany.com"
     keyTab="/location/to/your.keytab"
     debug=false;
    };
    Tip: Check with your Kerberos administrator if you are unsure what the principal account is. The keytab location is the location where you placed the keytab file in the previous step.
  5. OpenData360 Analyze, then from the Directory select Create > Data Flow.
  6. In the Nodes panel, search for the JDBC Query or Database Metadata node and drag it onto the canvas.
  7. Select the node. From the Properties panel, expand the Advanced property group and configure the following properties:
    1. DbUrl - Enter the database connection URL. There may be specific requirements for your JDBC URL. Please refer to your database administrator or Kerberos administrator to configure the URL appropriately. For example jdbc:hive2://XXXXXXXXXXX:10000/default;principal=hive/XXXXXX@COMPANY.COM
    2. DbDriver - Enter the following: org.apache.hive.jdbc.HiveDriver
    3. DbDriverClasspath - Specify the full path to the directory that contains your Hive drivers using the %ls.appDataDir% substitution for anything installed into your Data360 Analyze site configuration directory. For example if you installed into:

    C:\ProgramData\Data360Analyze\site\lib\java\Data3SixtyAnalyze_External

    You should reference this using: %ls.appDataDir%\lib\java\Data3SixtyAnalyze_External

    If you want to reference multiple directories, list the directories using a semi colon to separate each entry, for example:

    %ls.appDataDir%\lib\java\Hive_drivers1;%ls.appDataDir%\lib\java\Hive_drivers2

  8. Select the node and from the menu button in the top right corner of the Properties panel, select Show Hidden Properties.
  9. In the JvmArguments property, add the following, replacing the locations of the gss-jaas.conf and krb5.conf files with actual locations:
    -Djava.security.auth.login.config=/location/to/gss-jaas.conf
    -Djava.security.krb5.conf=/location/to/krb5.conf
    -Djavax.security.auth.useSubjectCredsOnly=false
    Note: There are two similarly named properties, please ensure that you add the above information to the JvmArguments property and not to the _JvmArguments property. Each JVM argument must be on a separate line.
  10. Run the node to import data.