Designing a Dataflow for Real-Time Revalidation - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Product family
Spectrum > Quality > Spectrum Quality
Product name
Spectrum Data Quality
Spectrum Data Quality Guide
Topic type
How Do I
First publish date

If you are using exception management in your dataflow, you can use the revalidation feature to rerun exception records through the validation process after they have been corrected in the Data Stewardship Portal. This enables you to determine if the change you made causes the record to process successfully in a real-time manner. You do not need to wait until the Read Exceptions batch job runs again to see the result.

The basic building blocks of a revalidation environment are:
  • A job or a service that reuses or contains an exposed subflow. It must also contain an input source, the subflow stage that processes the input, a Write Exceptions stage, and an output sink for successfully processed records.
  • An exposed subflow containing an Exception Monitor stage that points to a revalidation service and is configured for revalidation, including designating whether revalidated records should be reprocessed or approved.
  • An exposed service that also reuses or contains the exposed subflow. It processes records that were edited, saved, and sent for revalidation in the Data Stewardship Portal.

Here is an example scenario that helps illustrate a revalidation implementation:

Revalidation implementation illustration
In this example, there are three dataflows: a job, a subflow, and a service. The job runs input data through the subflow. The subflow contains an Exception Monitor stage, which determines if a record should be routed for manual review. That means any records with no data in the PostalCode field would be considered an exception and would be routed to the Write Exceptions stage; these exceptions are what appears in the Data Stewardship Portal. Records with anything else in that field would be routed to the Write to File stage.
Note: If your dataflow is also being configured to use best of breed functionality, you will need to manually add and expose the CollectionRecordType field in the revalidation Exception Monitor stage/subflow and the service itself. See Write Exceptions options and Creating a Best of Breed Record for more information on best of breed functionality.

The exception revalidation service that you designated when configuring the Exception Monitor stage is called when you edit one or more records from the exception repository on the Data Stewardship Portal Editor page and click Save. Like the job, the service contains the exception monitor subflow that uses the same business logic to reprocess the records. If the records fail one or more conditions set in the Exception Monitor stage, the exceptions will be updated in the repository. If the records pass the conditions set in the Exception Monitor stage, one of two actions will occur, depending on the selection made for Action after revalidation:

Reprocess records
Records will be deleted from the repository and reprocessed.
Resolve records
The approved records are retained in the repository and their status changed to Resolved.

To step through the procedure, see Create a real-time validation scenario.