The Reshape node allows you to convert rows into columns or columns into rows.
Converting Rows into Columns
At times, you may find it useful to convert a field's records into fields themselves. This option allows you to do so, by selecting Fields to Group By and Fields to Move to Columns. Once Fields to Group By have been selected, Fields that have been moved to Columns that are part of the same group will appear in the same row.
For example, consider performing a Reshape on the following sample from a data set:
id |
loan_amnt |
term |
int_rate |
---|---|---|---|
001 |
19200 |
60 months |
13.99 |
002 |
7200 |
36 months |
12.29 |
003 |
8000 |
36 months |
14.65 |
004 |
8175 |
36 months |
16.99 |
005 |
8000 |
36 months |
13.99 |
Sample of Loan Data Set Before Reshape
Selecting term as the Field to Group By, id as the Field to Move to Columns, and 5 as the Number of Rows to Move to Columns would reshape the data set into the following:
loan_amnt |
term |
int_rate |
id |
id_2 |
id_3 |
id_4 |
id_5 |
---|---|---|---|---|---|---|---|
7200 |
36 months |
12.29 |
002 |
003 |
004 |
005 |
009 |
19200 |
60 months |
13.99 |
001 |
014 |
101 |
089 |
045 |
Sample of Loan Data Set After Reshape - Grouped By a Single Field
In this reshaped data set, the first 5 ids that have a 36 month term have been turned into columns and the first 5 ids that have a 60 month term have been turned into columns. This is highlighted by the values in bold. Additionally, ids that were not listed in the Sample of Loan Data Set Before Reshape were found; for the sake of space, this example assumes that the data set is much larger than what is listed.
It is also important to note that the loan_amnt and int_rate values listed in each record are only applicable to the first id, because we have not grouped by these fields. To obtain records where loan_amnt, term, and int_rate all apply to the ids listed in each record, we would need to Group By multiple fields, which the Reshape node does allow.
Doing so could result in the following data set, where each id matches the loan_amnt, term, and int_rate in its record.
Sample of Loan Data Set After Reshape - Grouped By Multiple Fields
loan_amnt |
term |
int_rate |
id |
id_2 |
id_3 |
id_4 |
id_5 |
---|---|---|---|---|---|---|---|
19200 |
60 months |
13.99 |
001 |
|
|
|
|
7200 |
36 months |
12.29 |
002 |
|
|
|
|
8000 |
36 months |
14.65 |
003 |
|
|
|
|
8175 |
36 months |
16.99 |
004 |
|
|
|
|
8000 |
36 months |
13.99 |
005 |
149 |
|
|
|
Converting columns into rows
At other times, you may wish to convert columns into rows, which the Reshape node also allows.
For example, you may have a data set where records may have either an id or member_id, and you want to merge these columns into a single field comprised of rows from each field.
id |
member_id |
---|---|
001 |
|
|
143 |
009 |
|
Example 1: Data Set Before Column to Row Reshape
Specifying id and member_id as Fields to Move to Rows and Name of New Field to Move Columns to as common_id would reshape the data set into the following:
common_id |
---|
001 |
|
143 |
|
|
009 |
Example 1: Data Set After Column to Row Reshape
Note that record ordering will following the order in which Fields to Move to Rows are listed in the Properties panel. Also note that blank/null records have been carried over into the common_id column; however, these could easily be removed by using a Filter node.
An example using multiple columns can further illustrate how converting columns to rows will reshape your data set. Consider the following sample.
state |
Q1 |
Q2 |
Q3 |
Q4 |
---|---|---|---|---|
Indiana |
25 |
50 |
75 |
100 |
Maryland |
50 |
100 |
150 |
200 |
California |
500 |
1000 |
1500 |
2000 |
Wisconsin |
100 |
200 |
300 |
400 |
Virginia |
50 |
100 |
150 |
200 |
Illinois |
1000 |
2000 |
4000 |
5000 |
Example 2: Data Set Before Column to Row Reshape
Specifying Q1, Q2, Q3, and Q4 as Fields to Move to Rows, creating a field named Amount for the Name of New Field to Move Columns to parameter, and creating a field named Quarter for the Name of New Field to Move Column Labels to parameter would result in the following data set.
state |
Amount |
Quarter |
---|---|---|
Indiana |
25 |
Q1 |
Indiana |
50 |
Q2 |
Indiana |
75 |
Q3 |
Indiana |
100 |
Q4 |
Maryland |
50 |
Q1 |
Maryland |
100 |
Q2 |
Example 2: Sample of Data Set After Column to Row Reshape
Note that the four quarter columns have been consolidated into a single column, and that the labels of these previous four columns are now used as the values for the column's records. As a result, each state now has four records - one for each quarter. Similarly, the values that filled the quarter columns before the reshape now populate the Amount column, each value corresponding to the quarter and state it was associated with before the reshape.
Also note that for brevity, only six records are shown in the table above. After this reshape, the full data set would actually contain 4 records for each state (one record for each quarter column), for a total of 24 records.