Creating an Entity - trillium_discovery - trillium_quality - 17.1

Trillium Control Center

Product type
Software
Portfolio
Verify
Product family
Trillium
Product
Trillium > Trillium Discovery
Trillium > Trillium Quality
Version
17.1
Language
English
Product name
Trillium Quality and Discovery
Title
Trillium Control Center
Topic type
Overview
Administration
Configuration
Installation
Reference
How Do I
First publish date
2008

Before you can create a project, you must create an entity that contains (or links to) your data. There are several distinct tasks involved:

  • Select the data source
  • Establish the schema file settings, based on the data source
  • Preview the data
  • Customize the data
  • Set the load parameters
  • Schedule the job
All of these tasks are described in the following procedure.
Guidelines:
  • Each entity you create requires that a repository administrator first do the following:
    • Create repositories (one or more) into which you will import the data
    • Create loader connections, one for each type of data source to which you want to connect
    • Grant you permission to use the repositories

If these tasks have not been set up properly, you will not be able to create an entity and start your work.

  • During the entity creation process, you can optionally create or modify a project that uses the new entity. The following procedure only describes the steps required to create an entity.
  • You can also add entities in the Discovery Center. These entities will display on the Control Center Entities tab. Similarly, entities you create in the Control Center will be available when you access the same repository through the Discovery Center. For more information, see Working with the Discovery Center.
  • To create HDFS entities (data sources) and profile HDFS data in your Hadoop environment, use the Discovery Center application, installed with the Trillium repository server software. HDFS entities can be viewed in Control Center but are supported for profiling activities in Discovery Center only.

CyberArk Security Integration

Discovery Center and Control Center can be integrated with your company’s CyberArk account security solution to make retrieving and passing database credentials more secure.

To use the integration, when you add a data source (entity) that uses a data (loader) connection to a password-protected database, rather than entering the database user ID and password, supply CyberArk credentials. Then, each time the data connection accesses a password-protected ODBC database, the database credentials are retrieved from an encrypted digital vault on a centralized CyberArk server. The credentials are unique for each data source you add. Even when CyberArk is configured, you still have the option to enter your standard database user name ID and password.

Note: Ensure the CyberArk password vault URL has been configured by your administrator for your software instance. If not, you will not be prompted for the CyberArk credentials.

To create an entity

  1. Click Create entity icon on the Home tab.
    Note: Alternatively, you can right-click the Entities tab in the Navigation View and select Create Entity.

    The Create Entity Wizard opens to the Connection Page.

  2. Select the loader connection that applies to the new entity. (See About Loader Connections for more information.)
    Note: To change the appearance of the connection list, right-click the text box and select Large Icons, Small Icons, List View, or Detail View.
  3. (Optional) To reduce the number of connections displayed, replace the asterisk in the Connection list currently filtered on field with a new filter expression, and click Change Filter. Use ANSI standard SQL to define the filter.
  4. If you select an RDBMS connection (for example, Oracle, DB2, or ODBC), you may be prompted for your user credentials if security has been configured for the connection to the database server. In this case, the connection type drop-down list displays.

    Click the list and select the type of credentials to use when accessing the data connection, either ODBC Standard (the default, used for standard RDBMS connections) or CyberArk (for secure ODBC connections only).

    • Do one of the following in the Connection validation section:
    • If you selected ODBC Standard, enter your database Username and Password.
    • If Trillium Discovery is integrated with CyberArk and you selected an ODBC connection, specify the following information (supplied by your CyberArk administrator) to be used to access the data connection:
      • Application ID. Enter the CyberArk Application ID used to access the data connection.
      • Query. Enter the unique query to be used as the filter criteria to retrieve the password credentials. For example, Safe=AWSafe;Object=Database-MSSql-admin-cdqdb
      • Certificate Path. (Optional) If your CyberArk solution is configured to authenticate calls to the vault using a client certificate, enter the certificate path.
  5. Click Next. Trillium connects to the data source directory, using the loader connection you selected in Step 2, and the Entity Selection window opens.
    Note: If the connection fails, ask your repository administrator to check the data source location and loader connection setup. After the problem is corrected, open the Create Entity Wizard and try again.
  6. (Optional) In the Entity list currently filtered on: field, type a filter expression to display a select list of available data source files and tables. After defining the filter, click Change Filter.
  7. Select one or more data source file names in the list and click Next.
    Note: You can select multiple data files to create multiple entities at the same time using the same settings. However, if you select multiple files, you will not be able to preview the files.
  8. Set the schema file settings for the selected data file(s). The settings vary, depending on the type of data source. In the list below, click the appropriate data source type for details about the settings for each.
    Note: If your data source is RDBMS, there are no schema settings. If your data source is a TSQ file, you create a schema file or use an existing file. For more information, see Creating Schema Files.
  9. Optional: If you want to see a preview of how the data will appear in the entity, based on the selected schema settings, or if you want to specify which data you want to import into the entity, click Preview.
    Note: If you selected multiple data files in step 8, you cannot preview the files.
  10. Optional: Use the Preview window to customize the data that will be imported:
    1. Right-click a column header and select Hide to remove the column from the data load.
    2. Drag and drop a single column to a new location.
    3. Right-click anywhere in the column header and select Choose Columns. Select the attributes you want to hide or change the order of columns by dragging attribute names to the correct location.
    4. Right-click anywhere and select Filter. Build an expression that defines the criteria you want applied to rows of data that are loaded.
    5. If your source is a relational database, you can filter the data by applying a SQL filter. Click here for details.
  11. When you are ready to continue, click Close. Changes you made to the data in Preview mode are preserved. When you import the data from the source into your entity, hidden columns will not be imported, only selected rows will be imported, and the columns will be arranged as you configured them in the Preview mode.
  12. Click Next. The Load Parameters window opens. The load parameters you specify determine the type of entity that Trillium creates. If you choose to import to a repository, you must either select the All Rows option or specify which data rows are to be imported from your data source.
  13. Verify that the Load the data now option is selected if you want to create a real entity. Alternatively, to create a Dynamic entity, clear the option.

    When creating a real entity, a link is created to the external source allowing you to:

    • Drill down to the rows
    • Create and run business rules
    • Filter
    Note: If you are creating a dynamic entity, you will not have access to features such as key and dependency analysis, until you load the data into a repository. With a dynamic entity, if the external source changes, the data displayed in Trillium also changes. A dynamic entity is ideal for quickly examining very large data sources.
  14. Use the following table to guide your selections:
    Note: The Control Center is designed to help you create and test projects based on sample data. If you are working with a large number of records, the most efficient way to use the Control Center is to use the Load Parameters options to define a subset of rows to include in the entity.
    Option Description
    All Rows

    Load all of the data rows.

    Note: This option is not available when creating dynamic entities.
    First [number] rows

    Load a selected number of records from the beginning of the file (for example, the first 1000 records).

    Note: This option is not available when creating dynamic entities.
    Random [percent] % sample

    Randomly sample a percentage of records from the file. Valid values are 0.01 to 99.99.

    Note: The random % is the percentage chance of rows being included. Therefore, the actual number of rows loaded may be different for each load of the same file even if you specify the same percentage.
    Note: This option is not available when creating dynamic entities.
    Skip first [number] rows

    Allows you to specify a starting row for data imports.

    For example, if your file has 300 rows and you select All Rows and Skip first 99 rows, Trillium will load 201 rows, starting with the 100th row.

    Job name Name of data load job.
  15. Click Next to continue.
    Note: If you click Create & Restart to schedule the job, you are returned to the entity selection list where you can select another entity. Use this button when you want to quickly load several files from the same source with different settings.
  16. In Confirm Settings, review the list of settings and click Finish. The scheduler notification bar displays.
  17. (Optional) Change the job name.
  18. Do one of the following:
    • To schedule the job to run immediately, click Now. The message closes and the job begins running in the background.
    • Click Later. The Set Date & Time scheduler window opens. To immediately run the job in the background, click Run Now. To schedule a time, select a starting date on the calendar and select a time you want the job to run on the selected date. Click Submit to save the scheduled time.
    • Click Cancel to cancel the task.