Input - 23.1

Spectrum Dataflow Designer Guide

Version
23.1
Language
English
Product name
Spectrum Technology Platform
Title
Spectrum Dataflow Designer Guide
First publish date
2007
Last updated
2024-05-09
Published on
2024-05-09T23:01:03.226155

The Import to Model stage requires that your dataflow contain two channels: one that provides data for entities going into the Entity Port (the top port) and one that provides data for relationships going into the Relationship port (the bottom port). This requirement could be met by two source stages (each containing one input file), or it could come from multiple source stages that feed into Record Combiners and ultimately become two streams, or it could come from one source file whose data goes through a Conditional Router or a Splitter that outputs the data into two streams. It doesn't matter which method you use as long as the end result is a channel of entity data and a channel of relationship data that go into the Import to Model stage.

Entity Data

Data going into the Entity Port includes both type and ID information for entities. You can have a Type field ("Person") and an ID field ("Robert"), or you can have just an ID field that combines both type and ID information, separated by a colon ("Person:Robert"). Entity data may also include properties that characterize the entities, such as Name or Age for person. For instance, your file could look something like the comma-delimited data below. The Type field tells us that the entities are people, and the ID field in this case identifies the name of a person. Attribute fields depend on the Type field setting.

Type,ID,Age
Person,Robert,63
Person,James,54
Person,John,34
Person,Arcadia,44
Person,Louie,27

The fields that contain type and ID data do not actually need to be named "Type" and "ID". Any field name is acceptable. So, your input file could contain a single field that combines both type and ID, this time labeled Name:

Type:Name,Age
Person:Robert,63
Person:James,54
Person:John,34
Person:Arcadia,44
Person:Louie,27

Different types of fields are often imported from separate files that correspond to export data from different database tables. In this case the Type is Place, and the properties are City, Latitude, and Longitude. City is the ID field for these records.

Type,City,Latitude,Longitude
Place,New York,40.7114959,-74.012224
Place,Los Angeles,34.052235,-118.243683
Place,San Francisco,37.773972,-122.431297
Place,District of Columbia,38.942142,-77.025955
Place,London,51.509865,-0.118092

Relationship Data

Data going into the Relationship Port needs to include fields that identify source types, source IDs, target types, target IDs, and labels that identify the relationships between the sources and targets. Note that all source and target entity information must reference entities that are provided on the Entity Port. Your relationship data may also include properties about those relationships. For instance, your file could look something like the data below. In this case, the SourceType field tells us that all sources are people, and the TargetType field tells us that the targets are people and places. The SourceID field provides names of all the sources, and the TargetID field provides names of the people and places. The Label field identifies the relationships, in this case "Works_With", "Works_At", or "Lives_At".

SourceType,SourceID,Label,TargetType,TargetID
Person,Robert,Works_With,Person,James
Person,Robert,Works_With,Person,John
Person,James,Works_With,Person,Robert
Person,James,Works_With,Person,John
Person,John,Works_With,Person,Robert
Person,John,Works_With,Person,James
Person,Robert,Works_At,Place,London
Person,James,Works_At,Place,London
Person,John,Works_At,Person,London
Person,Robert,Lives_At,Place,New York
Person,James,Lives_At,Place,Los Angeles
Person,John,Lives_At,Place,San Francisco