Sorts data based on the specified input fields, in either ascending or descending order.
- Connect the Sort node to a node that has run successfully to ensure that the input data is available.
- Select the Sort node and view the Properties panel.
- In the SortBy property, select a Field to sort by from the drop-down menu. You can type in the drop-down to filter the list of fields to those which contain the typed text.
- By default, your data will be sorted in ascending order (low to high). If you want to sort in descending order (high to low), click the menu button to the right of the field name and select Sort Descending.Note: When ordering any data type, Data360 Analyze considers null values to be lower than any other value. Therefore, if you sort in ascending order, null values will come first. Alternatively, if you sort in descending order, null values will appear last.
- You can add multiple sort criteria by repeating the steps above. The menu button to the right of each item also allows you to:
- Sort the selected field case insensitively.
- Compare Substrings, see Advanced example - sorting on a sub-string
- Delete the selected field.
Also in this topic, see:
Tips
- SortBy property - If you have connected the Sort node to a node that has not yet run, the SortBy property will not contain a list of available fields. In this case, you can type the name of a field in the SortBy property and press Enter to commit the field name. When the node is executed, an error will be generated if this field does not exist. The SortBy property is a multi-field picker, a property type which is found on a number of nodes. For more information on this property type, see Multi-field picker.
- Unique property - You can use the Sort node to remove duplicate records based on the field that you are sorting on by setting the Unique property to True. If you are sorting by multiple fields, all specified fields must match for the record to be removed.
- If you are sorting by more than one field, you can drag and drop the fields to reorder them. The order of the sort criteria determines which field the data will be sorted by first.
- Sorting can be time consuming when handling very large data sets. Where possible, aim to sort data only once within a data flow.
Example
Consider the following input to a Sort node:
Account_Name | Account_Number | Account_ID |
---|---|---|
John | 1 | a |
John | 1 | a |
Edward | 2 | a |
John | 3 | b |
Bob | 5 | z |
John | 5 | a |
Bob | 2 | f |
Edward | 3 | b |
Edward | 1 | c |
- In the SortBy property, select the three fields to sort on, in the following order:
- Account_Name
- Account_Number
- Account_ID
- Accept the default sort direction of ascending (low to high), and run the node.
The output would display the records sorted in ascending order, as follows:
Account_Name Account_Number Account_ID Bob 2 f Bob 5 z Edward 1 c Edward 2 a Edward 3 b John 1 a John 1 a John 3 b John 5 a - To sort this data in descending order, click the menu button to the right of each of the field names in the SortBy property, and in each case select Sort Descending.
The output would then be:
Account_Name Account_Number Account_ID John 5 a John 3 b John 1 a John 1 a Edward 3 b Edward 2 a Edward 1 c Bob 5 z Bob 2 f - To remove duplicate fields, set the Unique property to True. In this case, the output would then contain only 8 records as one of the "John, 1, a" records would be removed.
Advanced example - sorting on a substring
Consider the following input to a Sort node:
Input_data:string |
---|
123-4561 |
456-1231 |
908-0341 |
- In the SortBy property, select the field name "Input_data".
- Click the menu button to the right of the "Input_data" field token in the SortBy property, and select Compare Substrings.
The Compare Substrings dialog will open. From here, you can enable or disable the option, set a start position and set an optional end position (where the first character of a string is at position 0).
- To sort the data based on a subset of two characters, starting at the fourth character:
- Select Compare substrings from start position and type 4.
- In the end position box, type 6.
- Click Done.
The output would be as follows:
Input_datastring |
---|
908-0341 |
456-1231 |
123-4561 |
For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to sort by using the notation fields.<name>
separating each field reference with a comma. To sort in descending order, use the fn.desc
function.
Properties
SortBy
Select or type the names of the fields that you want to sort by. From the menu button to the right of the field name, you have the option to change the sort order to Sort Descending (high to low), you can select Case Insensitive sorting, or for more advanced cases you can choose to Compare Substrings. There is also an option to Delete a selected field from the list.
If you have added multiple sort criteria, you can drag and drop the fields to reorder them if needed. The order of the sort criteria determines which field the data will be sorted by first.
For advanced use cases, you can select the Advanced tab to type Python script to specify the fields that you want to sort by using the notation fields.<name>
separating each field reference with a comma. To sort in descending order, use the fn.desc
function.
Example: fields.FirstName, fn.desc(fields.DOB)
A value is required for this property.
Unique
Optionally specify whether to exclude duplicate entries based on the field that you are sorting on. If you are sorting by multiple fields, all specified fields must match for the record to be removed.
The default value is False.
MergeOnly
If the node has multiple inputs, you can optionally specify whether to only merge the inputs — sorting records across the different inputs, rather than sorting within the inputs.
BufferSize
Optionally specify the number of records to process in memory. Each batch of records will be sorted in memory and cached to file before all of the batches are finally merged at the end.
Generally, this property should not need to be set unless the records are too wide for 10,000 to be stored in memory, or if the records have a highly variable width in bytes - e.g. on most records a field is null, but occasionally is very large.
If no value is specified, the BufferSize will be dynamically calculated based on the width of the incoming records, where initially 10,000 records are loaded into memory by the node.
JvmMaxHeapSize
Optionally specify the maximum heap size of the JVM instance that runs the node.
If this property is not set, then the value defaults to the value set in the server side property ls.brain.node.java.javaMaxHeapSize.
For more information, see Increasing the heap space for Java-based nodes.
Example data flows
A number of sample Data Flows are available from the Samples workspace, found in the Analyze Directory page.
In the Directory under the /Data360 Samples/Node Examples/
folder, you will find "Sorting data", which shows examples of how to use this node.
Inputs and outputs
Inputs: in1, multiple optional.
Outputs: Sorted Data.