Sort - Data360_Analyze - Latest

Data360 Analyze Server Help

Product type
Software
Portfolio
Verify
Product family
Data360
Product
Data360 Analyze
Version
Latest
Language
English
Product name
Data360 Analyze
Title
Data360 Analyze Server Help
Copyright
2024
First publish date
2016
Last updated
2024-11-28
Published on
2024-11-28T15:26:57.181000

Sorts data based on the specified input fields, in either ascending or descending order.

Tip: You can also sort your data in the data viewer then apply the sort as a node to your data flow, see Adding sorted data from the data viewer to your data flow.

  1. Connect the Sort node to a node that has run successfully to ensure that the input data is available.
  2. Select the Sort node and view the Properties panel.
  3. In the SortBy property, select a Field to sort by from the drop-down menu. You can type in the drop-down to filter the list of fields to those which contain the typed text.
  4. By default, your data will be sorted in ascending order (low to high). If you want to sort in descending order (high to low), click the menu button to the right of the field name and select Sort Descending.
    Note: When ordering any data type, Data360 Analyze considers null values to be lower than any other value. Therefore, if you sort in ascending order, null values will come first. Alternatively, if you sort in descending order, null values will appear last.
  5. You can add multiple sort criteria by repeating the steps above. The menu button to the right of each item also allows you to:

Also in this topic, see:

Tips

  • SortBy property - If you have connected the Sort node to a node that has not yet run, the SortBy property will not contain a list of available fields. In this case, you can type the name of a field in the SortBy property and press Enter to commit the field name. When the node is executed, an error will be generated if this field does not exist. The SortBy property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.
  • Unique property - You can use the Sort node to remove duplicate records based on the field that you are sorting on by setting the Unique property to True. If you are sorting by multiple fields, all specified fields must match for the record to be removed.
  • If you are sorting by more than one field, you can drag and drop the fields to reorder them. The order of the sort criteria determines which field the data will be sorted by first.
  • Sorting can be time consuming when handling very large data sets. Where possible, aim to sort data only once within a data flow.

Example

Consider the following input to a Sort node:

Account_Name Account_Number Account_ID
John 1 a
John 1 a
Edward 2 a
John 3 b
Bob 5 z
John 5 a
Bob 2 f
Edward 3 b
Edward 1 c

  1. In the SortBy property, select the three fields to sort on, in the following order:
    1. Account_Name
    2. Account_Number
    3. Account_ID
  2. Accept the default sort direction of ascending (low to high), and run the node.

    The output would display the records sorted in ascending order, as follows:

    Account_Name Account_Number Account_ID
    Bob 2 f
    Bob 5 z
    Edward 1 c
    Edward 2 a
    Edward 3 b
    John 1 a
    John 1 a
    John 3 b
    John 5 a
  3. To sort this data in descending order, click the menu button to the right of each of the field names in the SortBy property, and in each case select Sort Descending.

    The output would then be:

    Account_Name Account_Number Account_ID
    John 5 a
    John 3 b
    John 1 a
    John 1 a
    Edward 3 b
    Edward 2 a
    Edward 1 c
    Bob 5 z
    Bob 2 f
  4. To remove duplicate fields, set the Unique property to True. In this case, the output would then contain only 8 records as one of the "John, 1, a" records would be removed.

Advanced example - sorting on a substring

Consider the following input to a Sort node:

Input_data:string

123-4561
456-1231
908-0341

  1. In the SortBy property, select the field name "Input_data".
  2. Click the menu button to the right of the "Input_data" field token in the SortBy property, and select Compare Substrings.

    The Compare Substrings dialog will open. From here, you can enable or disable the option, set a start position and set an optional end position (where the first character of a string is at position 0).

  3. To sort the data based on a subset of two characters, starting at the fourth character:
    1. Select Compare substrings from start position and type 4.
    2. In the end position box, type 6.
    3. Click Done.

The output would be as follows:

Input_datastring

908-0341
456-1231
123-4561

For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to sort by using the notation fields.<name> separating each field reference with a comma. To sort in descending order, use the fn.desc function.

Tip: See Pythonmodule support for more information.

Properties

SortBy

Select or type the names of the fields that you want to sort by. From the menu button to the right of the field name, you have the option to change the sort order to Sort Descending (high to low), you can select Case Insensitive sorting, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list.

If you have added multiple sort criteria, you can drag and drop the fields to reorder them if needed. The order of the sort criteria determines which field the data will be sorted by first.

For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to sort by using the notation fields.<name> separating each field reference with a comma. To sort in descending order, use the fn.desc function.

Example: fields.FirstName, fn.desc(fields.DOB)

Tip: See Pythonmodule support for more information on configuring this property by using Python script.

A value is required for this property.

Unique

Optionally specify whether to exclude duplicate entries based on the field that you are sorting on. If you are sorting by multiple fields, all specified fields must match for the record to be removed.

The default value is False.

MergeOnly

If the node has multiple inputs, you can optionally specify whether to only merge the inputs — sorting records across the different inputs, rather than sorting within the inputs.

Note: To use this option, the input data must already be sorted, and the fields across all inputs must match.

BufferSize

Optionally specify the number of records to process in memory. Each batch of records will be sorted in memory and cached to file before all of the batches are finally merged at the end.

Generally, this property should not need to be set unless the records are too wide for 10,000 to be stored in memory, or if the records have a highly variable width in bytes - e.g. on most records a field is null, but occasionally is very large.

If no value is specified, the BufferSize will be dynamically calculated based on the width of the incoming records, where initially 10,000 records are loaded into memory by the node.

JvmMaxHeapSize

Optionally specify the maximum heap size of the JVM instance that runs the node.

If this property is not set, then the value defaults to the value set in the server side property ls.brain.node.java.javaMaxHeapSize.

For more information, see Increasing the heap space for Java-based nodes.

Example data flows

A number of sample Data Flows are available from the Samples workspace, found in the Analyze Directory page.

In the Directory under the /Data360 Samples/Node Examples/ folder, you will find "Sorting data", which shows examples of how to use this node.

Note: Upgrades will overwrite all data flows in the workspace. If you want to make changes to one of the sample data flows, we recommend you create a copy and save it elsewhere, using Save as...

Inputs and outputs

Inputs: in1, multiple optional.

Outputs: Sorted Data.