Advanced Search Index Options - dataflow_designer - spectrum_quality_1 - 23.1

Spectrum Data Quality Guide

Product type
Software
Portfolio
Verify
Product family
Spectrum
Product
Spectrum > Quality > Spectrum Quality
Version
23.1
Language
English
Product name
Spectrum Data Quality
Title
Spectrum Data Quality Guide
Topic type
Overview
Reference
Tips
How Do I
First publish date
2007
ft:lastEdition
2024-03-04
ft:lastPublication
2024-03-04T22:52:13.486265
Table 1. Candidate Finder Options

Option Name

Description / Valid Values

Finder type

Select Search Index.

Name

Select the appropriate index that was created using the Write to Search Index stage under the Advanced Matching deployed stages in Enterprise Designer.

Starting record

Enter the record number on which search results should begin. The default is 1.

Maximum results

Enter the maximum number of responses you want the index search to return. The default is 10.
Note: If the maximum results is arbitrarily large, process those in batches, using the Fetch Batch Size field.
Fetch batch size

If the Maximum results is arbitrarily large, enter the size of batches in which you want the results to be processed. This optimizes processing of large number of records. Default is 10000.

The recommended Fetch Batch Size is a value lesser than Maximum results and if the Fetch Batch Size is greater than Maximum results, the records are processed in a single batch.

Note: This field is applicable only to cluster supported search engine and not to the legacy search engine.

Sort

Sorts the candidate records on the basis of indexed fields while running a search query.

Select the Sort check-box, the desired index field from the Sort by drop-down list, and select Ascending or Descending from the Order by drop-down list.

Note: You can perform sorting only on String Fields with Keyword Analyzer and Numeric fields.

Return match count

Returns the total number of matches that were made. For example, if you use the default of "10" for the Maximum results field above, only 10 results will be returned. However, if you check this box, the TotalMatchCount output field will tell you how many matches were made during processing.

Relevance

Controls the relevance of the Index Field.

Index search type Determines the type of index search you want to conduct. Select Advanced search.

Add Parent button

Access Parent Options.

Parent options—Name​

Enter a name for the parent.

Parent options—Searching method

Specify how to determine if a parent is a match or a non-match. One of these:

All true—A parent is considered a match if all children are determined to match. This method creates an "AND" connector between children.

Any true—A parent is considered a match if at least one child is determined to match. This method creates an "OR" connector between children.

None true—A parent is considered a match if none of the children is determined to match. This method creates a "NOT" connector between children.

Add Child button

Access Child Options.

Child options—Index field

Select the index field you want to use for comparison in the advanced search.

Child options—Search type

Specifies the searching/matching criteria that determines whether the input data is searched/matched with the indexed data. All searches are case insensitive.

Child options—Input field Select the input field you want to use for comparison in the advanced search.
Any Word/Phrase Starts With Determines whether the text contained in the search index field begins with the text that is contained in the input field.

For example, text in the input field “tech” would be considered a match for search index fields containing “Technical”, “Technology”, “Technologies”, “Technician” or even "National University of Technical Sciences". Likewise, a phrase in the input field “DEF  Sof” would be considered a match for search index fields containing “ABC DEF Software”, “DEF Software”, and “DEF Software India” but it would not be a match for search index fields containing “Software DEF” or “DEF ABC Software”.

Contains Determines whether the search index field contains the data from the input field. This search type considers the sequence of words in the input field while searching the search index field. For example, input field data “Precisely” and “Precisely Software” would be contained in a search index field of “Precisely Software Inc.”
Contains All Determines whether all alphanumeric words from the input field are contained in the search index field. This search type does not consider the sequence of words in the input field while searching the search index field.
Contains Any Determines whether any of the alphanumeric words from the input field is contained in the search index field.
Contains None Determines whether none of the alphanumeric words from the input field is contained in the search index field.
Fuzzy Determines the similarity between two alphanumeric words based on the number of deletions, insertions, or substitutions required to transform one word into another.
Use the Maximum edits parameter to set a limit on the number of edits allowed to be considered a successful match:
  • 0—Allows for no deletions, insertions, or substitutions. The input field data and the search index field data must be identical.
  • 1—Allows for no more than one deletion, insertion, or substitution. For example, an input field containing "Barton" will match a search index field containing "Carton".
  • 2—Allows for no more than two deletions, insertions, or substitutions. For example, an input field containing "Barton" will match a search index field containing "Martin".

The Fuzzy search type is used for single-word searches only. Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field. For example, if the index field says "Xyz" and the input field says "Xyz Abc", they would not be considered a match because of "Abc". However, if you check this box, "Abc" would be ignored and with "Xyz" being the first word, the two words would be considered a match.

Numeric Determines whether numbers from the input field are contained in the search index field.

The Numeric search type is used for single-word searches only.

Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field.
Pattern Determines whether the text pattern of the input field matches the text pattern of the search criteria. You can further refine the text pattern in the Pattern string field. For example, if the input field contains “nlm” and the pattern defined is “a*b?c” then it will match the following words “Neelam”, “nelam”, “neelum”, “nilam”, and so on.

The Pattern search type is used for single-word searches only. Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field.

Proximity Determines whether words in the input fields are within a certain distance of each other.
  • Define the input First input field and Second input field you want to search for in the index.
  • Use the Distance parameter to determine the maximum allowed distance between the words specified in the First field and Second field in order to be considered a match.

For example, you could successfully use this search type to look for First field "Spectrum" and Second field "Precisely" within ten words of each other in a search index field containing the sentence “Spectrum Technology Platform is a product of Precisely Software Inc.”

The Proximity search type is used for single-word searches only. Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field.

Range Performs an inclusive searches for terms within a range, which is specified using a Lower bound field (starting term) and an Upper bound field (ending term). All alphanumeric words are arranged lexicographically in the search index field.
  • Use the Lower bound field parameter to select the field to be used as the starting term.
  • Use the Upper bound field parameter to select the field to be used as the ending term.

For example, if you searched postal codes from 20001 (defined in the Lower bound field) to 20009 (defined in the Upper bound field), the search would return all addresses with postal codes within that range.

The Range search type is used for single-word searches only. Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field.

Wildcard Searches using single or multiple Wildcard characters.

Select the Position in your input file where you are inserting the wildcard character.

The Wildcard search type is used for single-word searches only. Click Ignore extra words to have Candidate Finder consider only the first word in the field when comparing the input field to the index field.

Child options—Relevance factor

Control the relevance of a child field by entering any positive number up to 100 here. The number can be less than "1" also; for instance, ".05" would be valid.

The higher the boost factor, the more relevant the field will be. For example, if you want results from the Firm Name field to be more relevant than the results from other fields, select "Firm Name" from the Index field name and enter "5" here.
Note: By default, this option is disabled. Select the check box to enable it.

Ignore Blanks

Clear this check-box if you want the query to take into account the blank input file fields.
Note: By default the query ignores the blank fields.

Output Fields tab

Check the Include box to select which stored fields should be included in the output.
Note: If the input field is from an earlier stage in the dataflow and it has the same name as the store field name from the search index, the values from the input field will overwrite the values in the output field.