Creating an analysis - Data360_DQ+ - Latest

Data360 DQ+ Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 DQ+
Version
Latest
Language
English
Product name
Data360 DQ+
Title
Data360 DQ+ Help
Copyright
2024
First publish date
2016
ft:lastEdition
2024-07-09
ft:lastPublication
2024-07-09T15:09:58.774265

You can use an analysis to manipulate data. The general process flow of any analysis is as follows:

(1) In the first phase of design, data store inputs are selected.

(2) In the second phase, fields from these data stores are manipulated, as they move through some combination of Enhance, Combine, Shape, Check, System, and Analytics nodes.

(3) In the third phase, manipulated fields are pushed to and stored in one or more data store outputs, which are then saved in a pipeline to be used by other data stages.

Before you begin

Before you can create an analysis, you will need the following:

  • Create Analysis and Write permissions to a pipeline.
  • Access to at least one data store.

All users can benefit from using the Analysis Designer. If you have prior knowledge of a data set, you may be able to approach design with a specific output in mind. On the other hand, the Analysis Designer can also work as a tool for experimentation and exploration, with no prior knowledge required.

Configuring analysis designer nodes

  1. Select the Pipelines menu at the top of the page.
  2. Click the menu button to the right of the path in which you want to create the analysis:
  3. Select New > Analysis.
  4. Drag and drop nodes from the left of the screen onto the canvas.
  5. Connect the nodes to build your analysis.
  6. Select each node in turn and configure the properties on the right of the screen.
  7. Select the Current and Downstream Nodes dropdown button to choose how much interactive execution will occur on the current node.
  8. Select Apply Changes to save the changes.

Copying and pasting nodes

Tip: If you need to create similar nodes to existing ones, you can copy them from the canvas and paste them elsewhere, using the Copy and Paste controls in the Analysis toolbar.

Do the following:

  1. Select the appropriate node or nodes.
  2. Click "Copy selection to the clipboard".
  3. Click "Paste from clipboard".
  4. Drag the copied node or selection to the appropriate place on the canvas.
  5. Update the properties of the selection, as appropriate.

There is no relationship or link between the original node or selection and the copied one, other than having similar properties. As a result, a change in one will not be reflected in a change in the other one.

Note: Currently, you can only copy and paste nodes within the current browser tab.

For more information about specific nodes, see Analysis Designer Nodes.

Configuring analysis settings

  1. Select the Pipelines menu at the top of the page.
  2. From the Pipelines browser on the left of the screen, select your analysis.
  3. From the Analysis screen, click the Settings button to open the Analysis Settings dialog.
  4. Configure the following properties, as required.

Details properties

  • Description - Optionally, enter a description for the analysis. If you enter a description, this will be displayed in a tooltip when you hover over the analysis in the Pipelines view.
  • Log Limit - Set the maximum number of errors that will be displayed in the Analysis Designer's Output Log when Testing. This is also a maximum threshold for the number of errors that you want the executor to allow before exiting the Analysis run and marking the run as Failed.
  • Sampling Data for Testing - Set the sample size of data used for Testing an Analysis. Note that this setting will interact with the Sampling settings made in Data Store Inputs and Sample nodes. Analysis-level settings are applied before settings made at the node level.
  • Caching and Record Count - Specify whether caching will occur for all nodes in your Analysis, and whether to collect accurate record counts for all nodes.

    Caching the output of a node can increase the speed of an analysis when recomputes are required at points where the analysis splits.

    Caching the output of all nodes is recommended on smaller data sets, to save time. With larger data sets, however, global caching can cause a significant decrease in performance.

    Cache output and collect accurate record counts for all nodes is turned on by default.

    Tip: For larger data sets, you can use the Cache Data node to only cache the output of specific nodes. See Cache Data Node.

    When collecting accurate record counts, the system's Execution History will track record counts at each node within an analysis.

Execution properties

You have the option to override the default execution settings that are used when the Analysis is executed by using one or more of the following properties:

  • Execution Property - Select this option, then click the Add button, to create new execution properties.
  • Execution Profile - Select a predefined execution profile to inherit the execution settings from a profile that has been created on your environment by an administrator.
  • Overriding Execution Sizing - Select this option to edit the execution settings for the selected Analysis. The options that are available, and the default values, will vary according to the selected Cluster Type.

You may want to use a combination of the above settings. For example, an Execution Profile may include execution sizing configuration that you want to inherit on your Analysis, but you may want to also add a new execution property for this specific Analysis. Or, you could choose to override the execution sizing settings of the Execution Profile, while inheriting the execution properties defined on the profile.

Note: If you promote or import an Analysis that references an Execution Profile, the system will automatically create an empty profile with the same name on the target Environment, if it does not already exist. An administrator will then need to configure the details of the Execution Profile on the target environment to enable the successful execution of the associated Analysis.

If you are an administrator and want help with creating and editing Execution Profiles, see Environment execution profile. For further assistance with these settings, please contact Infogix Support.

Runtime properties

You can create runtime properties by selecting a Data Store and mapping a field containing names to a field containing values. You can then reference the property via the name field throughout the Analysis, using the RUNTIME() function.

For example, consider the following sample data set:

id

value

001

100

002

200

003

300

Runtime data set

If you were to select id as the Property Name Field and value as the Property Name Value, you could use RUNTIME(id) in any node within your analysis to return the values for each unique name in your property.

In this case, RUNTIME(id) would return the following:

New Column

100

200

300

Results from call to RUNTIME(id)

Note that values in the Property Name Value should be unique per name. If a name has more than one value associated to it, only the first value found will be returned for all records.

Also note that to use Runtime Properties in interactive mode - that is, within the sheets of nodes while building your Analysis - you need to use the Test button to simulate a run of the entire Analysis (rather than using Test Sheet).

Showing and working with sheets

Sheets are displayed in the grid at the bottom of the Analysis Designer screen.

When building a new analysis, it is recommended that you keep the grid icon selected to display the test data sheet.

Every time you add a new node to your analysis, a new sheet will be generated automatically. Each sheet will then display what is happening to your fields at that point in the analysis.

Adding new columns

When working with sheets, you can add new columns (i.e. fields) that are comprised of functions that manipulate other fields.

To add a new column, click the New Column button on the sheet toolbar.

Once columns are added, they can be treated like any other field and pushed to new Analysis Designer nodes.

Unmasking secure fields

If you have permission to unmask a secure field, you may unmask records in bulk using the Unmask All button. Also note that only secure fields may be unmasked. Fields that are encrypted but not secure will be shown as masked in analysis sheets, however, you cannot use the Unmask All button to unmask these values.