Reads a small set of data stored within the data flow. This node is typically used when a small amount of unchanging data needs to exist within a data flow, often for test cases or examples.
The content should be standard CSV-formatted. By default we support RFC 4180 but allowing for comment lines, empty lines and ignoring unquoted spaces surrounding fields.
The Create Data node will treat lines beginning with the #
character as
comments
If the fields in your data set are not comma-separated, enter the delimiter in the
FieldDelimiter property. For example, if the fields are
tab-delimited, enter \t
.
If you want to quickly create a large data set to use as a test case, enter an integer in the Copies property to specify how many times to repeat the data in the output, see Autogenerating test data.
Example
You have the following records in the Data property:
First Name:string,Last Name:stringCharles,DickensJane,AustenWilliam,ShakespeareGeorge,Orwell
The data is comma-separated, so you do not need to specify a different value in the FieldDelimiter property (comma is the default).
You want to output each record twice, so in the Copies property you
type 2
.
The node produces the following output:
First Name string |
Last Name string |
---|---|
Charles | Dickens |
Jane | Austen |
William | Shakespeare |
George | Orwell |
Charles | Dickens |
Jane | Austen |
William | Shakespeare |
George | Orwell |
See also Data types.
Autogenerating test data
You can use the Create Data node to generate a random data set for testing:
- In the first line of the Data property, enter the header information of the data set that you want to create.
- In the second line of the Data property, enter instructions for
how to generate the data, in the following
format:
<<<TYPE|keyword=value|keyword=value|...>>>
For more information and examples, see Supported types.
- In the Copies property, type the number of rows that you want to generate.
For example, you could generate 100,000 rows of random data by typing
100000
in the Copies property and entering the
following text in the Data property:
ID:unicode,First:unicode,Last:unicode,Email:unicode,Amount Owed:unicode,Last Payment Date:unicode,Client Since:unicode,Status:unicode<<<ID|low=10000|format=%07d>>>,<<<NAME.FIRST>>>,<<<NAME.LAST>>>,<<<EMAIL>>>,<<<DOUBLE|low=100.0|high=500.0|format=%.2f|nulls=.1>>>,<<<DATE|low=2018-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<DATE|low=2012-01-01 00:00:00|high=2019-03-31 00:00:00|format=dd/MM/yyyy>>>,<<<ENUM|values=Platinum^Gold^Silver>>>
Considerations
- Each variable must be used in a 1:1 relation to a field. It is not possible to concatenate multiple variables into a single field.
- Following the header, there can only be a single row of variables. It is not possible to have alternating outputs by specifying two rows of variables.
- Each column is calculated independently. For example, if you have name and email columns, the randomly generated names and the names within email addresses will not match. Likewise, having zip codes and state columns will yield incorrect combinations. The sample data does not validate nor reconcile against itself during creation. The goal is to provide randomly generated pseudo-accurate data for testing purposes.
Supported types
The following table lists all valid types and their associated supported keywords:
Type | Description | Supported keywords | Example |
---|---|---|---|
ID
|
An auto-incrementing number. |
|
|
DATE
|
A random Date, DateTime or Time field. |
|
|
LONG
|
A random long valued integral number. |
|
|
|
A random floating point value. |
|
|
ENUM
|
A random value from a simple enumerated type, for example Platinum, Gold, Silver. |
|
|
nulls
is supported for all types and should be a number between 0.0 and 1.0 to represent the percentage of nulls returned, as opposed to a value. The default is 0.0 (no nulls).Additionally, the following types are also supported and will generate random values:
-
AIRPORT_CODE.IATA
-
COUNTRY.ISO-3166-2
-
COUNTRY.ISO-3166-3
-
COUNTRY.TEXT_EN
-
CREDIT_CARD_TYPE
-
CURRENCY_CODE.ISO-4217
-
EMAIL
-
GENDER.TEXT_EN
-
GUID
-
IPADDRESS.IPV4
-
IPADDRESS.IPV6
-
LANGUAGE.ISO-639-2
-
LANGUAGE.TEXT_EN
-
NAME.FIRST
-
NAME.FIRST_LAST
-
NAME.LAST
-
NAME.LAST_FIRST
-
STREET_ADDRESS_EN
-
TELEPHONE
-
URI.URL
If the locale is en-US, the following types are also supported:
-
MONTH.ABBR_en-US
-
MONTH.FULL_en-US
-
POSTAL_CODE.ZIP5_US
-
STATE_PROVINCE.PROVINCE_CA
-
STATE_PROVINCE.STATE_PROVINCE_NA
-
STATE_PROVINCE.STATE_US
Properties
Data
Specify delimited data to be output into the data flow. The content should be standard CSV-formatted. By default we support RFC 4180 but allowing for comment lines, empty lines and ignoring unquoted spaces surrounding fields.
FieldDelimiter
Optionally specify the field delimiter character to use. The default delimiter is the comma character (,).
Copies
Optionally specify an integer value specifying the number of times to repeat the Data on the output. The default value is 1.
Inputs and outputs
Inputs: None.
Outputs: Sample Data.