Advanced subset options for CMPFILDTA - assure_mimix - 10.0

Assure MIMIX Administrator Reference

Product type
Software
Portfolio
Integrate
Product family
Assure
Product
Assure MIMIX™ Software
Version
10.0
Language
English
Product name
Assure MIMIX
Title
Assure MIMIX Administrator Reference
Copyright
2023
First publish date
1999
ft:lastEdition
2024-05-07
ft:lastPublication
2024-05-07T13:36:02.962500

You can use the Advanced subset options (ADVSUBSET) parameter on the Compare File Data (CMPFILDTA) command for advanced techniques such as comparing records over time and comparing a random sample of data. These techniques provide additional assurance that files are replicated correctly.

For example, let us assume you have a limited batch window. You do not have time to run a total compare everyday, but have the requirement to assure that all data is compared over the course of a week. Using the advanced CMPFILDTA capability, you can divide this work over a number of days.

Advanced subsetting makes it simple to accomplish this task by comparing 10 percent of your data each weeknight and completing the remaining 50 percent over the weekend. However, as the following example demonstrates, it is always best to compare a random representative sampling of data. The Advanced subset options also provides this capability.

For example, if a member contains 1000 records on Monday, records 1 through 100 will be compared on Monday. By Tuesday, perhaps the member has grown to 1500 records. The second 10 percent, to be processed on Tuesday, will contain records 151 through 300. Records 101 through 150 will not get checked at all. Advanced subsetting provides you with an alternative that does not skip records when members are growing.

Advanced subset options are applied independently for each member processed. The advanced subset function assigns the data in each member to multiple non-overlapping subsets in one of two ways. It also allows a specified range of these subsets to be compared, which permits a representative sample subset of the data to be compared. It also permits a full compare to be partitioned into multiple CMPFILDTA requests that, in combination, assures that all data that existed at the time of the first request is compared.   

To use advanced subsetting, you will need to identify the following:

  • The number of subsets or “bins” to define for the compare

  • The manner in which records are assigned to bins

  • The specific bins to process

Number of subsets: The first issue to consider when performing advanced subset options is how many subsets or bins to establish. The Number of subsets element is the number of approximately equal-sized bins to define. These bins are numbered from 1 up to the number specified (N). You must specify at least one bin. Each record is assigned to one of these bins.

The Interleave element specifies the manner in which members are assigned to a bin.

Interleave: The Interleave factor specifies the mapping between the relative record number and the bin number. There are two approaches that can be used.

If you specify *NONE, records in each member are divided on a percentage basis. For example:

Interleave *NONE
 

Member A on Monday

Member A on Tuesday

Total records in member:

30

45

Number of subsets (bins):

3

3

Interleave:

*NONE

*NONE

Records assigned to bin 1:

1-10

1-15

Records assigned to bin 2:

11-20

16-30

Records assigned to bin 3:

21-30

31-45

Note that when the total number of records in a member changes, the mapping also changes. Records that were once assigned to bin 2 may in the future be assigned to bin 1. If you wish to compare all records over the course of a few days, the changing mapping may cause you to miss records. A specific Interleave value is preferable in this case.

Using bytes, the Interleave value specifies a number of contiguous records that should be assigned to each bin before moving to the next bin. Once the last bin is filled, assignment restarts at the first bin. Let us assume you have specified in interleave value of 20 bytes. The following example is based on the one provided in Table 71:

Interleave(20)
 

Member A on Monday

Member A on Tuesday

Total records in member:

30

45

Record length:

10 bytes

10 bytes

Number of subsets (bins):

3

3

Interleave (bytes):

20

20

Interleave (records):

2

2

Records assigned to bin 1:

1-2

7-8

13-14

19-20

25-26

1-2

7-8

13-14

19-20

25-26

31-32

37-38

43-44

Records assigned to bin 2:

3-4

9-10

15-16

21-22

27-28

3-4

9-10

15-16

21-22

27-28

33-34

39-40

45

Records assigned to bin 3:

5-6

11-12

17-18

23-24

29-30

5-6

11-12

17-18

23-24

29-30

35-36

41-42

If the Interleave and Number of Subsets is constant, the mapping of relative record numbers to bins is maintained, despite the growth of member size. Because every bin is eventually selected, comparisons made over several days will compare every record that existed on the first day.

In most circumstances, *CALC is recommended for the interleave specification. When you select *CALC, the system determines how many contiguous bytes are assigned to each bin before subsequent bytes are placed in the next bin. This calculated value will not change due to member size changes.

Specifying *NONE or a very large interleave factor maximizes processing efficiency, since data in each bin is processed sequentially. Specifying a very small interleave factor can greatly reduce efficiency, as little sequential processing can be done before the file must be repositioned. If you wish to compare a random sample, a smaller interleave factor provides a more random, or scattered, sample to compare.

The next parameters, the First subset and the Last subset, allow you to specify which bin to process.

First and last subset: The First subset and Last subset values work in combination to determine a range of bins to compare. For the First subset, the possible values are *FIRST and subset-number. If you select *FIRST, the range to compare will start with bin 1. Last subset has similar values, *LAST and subset-number. When you specify *LAST, the highest numbered bin is the last one processed.

To compare a random sample of your data, specify a range of subsets that represent the size of the sample. For example, suppose you wish to compare seven percent of your data. If the number of subsets are 100, the first subset is 1, and the last subset is 7, seven percent of the data is compared. A first subset value of 21 and a last subset value of 27 would also compare seven percent of your data, but it would compare a different seven percent than the first example.

To compare all your data over the course of several days, specify the number of subsets and interleave factor that allows you to size each day’s workload as your needs require. For example, you would keep the subset value and interleave factor a constant, but vary the First and Last subset values each day. The following settings could be used over the course of a week to compare all of your data:

Using First and last subset to compare data

Day of week

Number of subsets (bins)

Interleave

First subset

Last subset

Percentage compared

Monday

100

*CALC

1

10

10

Tuesday

100

*CALC

11

20

10

Wednesday

100

*CALC

21

30

10

Thursday

100

*CALC

31

40

10

Friday

100

*CALC

41

50

10

Saturday

100

*CALC

51

65

15

Sunday

100

*CALC

66

100

35

Note: You can automate these tasks using MIMIX Monitor. Refer to the MIMIX Monitor documentation for more information.