Showing posts with label cummulative. Show all posts
Showing posts with label cummulative. Show all posts

Ab Initio Component | SCAN:Part 3

 ....Continue from Part 2.....

Runtime behavior of SCAN

 

SCAN perform following operation for each group of records:

 

1. Performing Input selection:

  • If you have defined the input_select function, SCAN filters the input records accordingly.

  • However if you have not defined the input_select function in your transform, SCAN processes all records.

2. Performing Key change (for sorted input only):

  • For every record except the first, SCAN checks whether a key change has occurred:

  • SCAN compares the current record’s key value to the previous record’s key value, unless the key_change function is defined.

  • If the key_change function is defined, SCAN calls that function to check for a key change.

3. Performing Temporary initialization:

  • SCAN passes the first record in each group to the initialize transform function.

4. Performing Computation:

  • SCAN calls the scan transform function for each record in a group, including the first, using the input record and the temporary record for the group to which the input record belongs. The scan transform function returns a new temporary record.

5. Finalizing the output:

  • SCAN calls the finalize transform function once for every input record. SCAN passes the input record and the temporary record that the scan function returned to the finalize transform function. The finalize transform function produces an output record for each input record.

  • SCAN stops execution of the graph when the number of reject events exceeds the result of the following formula:

           limit+(ramp* number_of_records_processed_so_far)

6. Output selection:

  • If you have defined the output_select transform function, SCAN filters the output records.

Ab Initio Component | SCAN:Part 2

 .....Continue from Part 2.....


maintain-order(boolean, required)

  • This parameter is available only when the sorted-input parameter is set to False.

  • When the input is too large to fit within the memory limit specified by max-core, the maintain-order parameter, when set to True, stops the graph, ensuring that records are not reordered.

  • When the parameter is set to False (the default), the component stores some of its intermediate results in temporary files on disk. This alters the order of records.

Default is False.

grouped-input (boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to False.

  • Set this parameter to Data is grouped by a major key in order to specify the major-key by which the input is sorted or grouped. In this case, the key parameter becomes the minor key: it is the field (or fields) to be scanned.

  • When you specify a major key, SCAN is more efficient in its use of memory: SCAN clears its in-memory table of intermediate results at the end of each major-key group of input records.

Default is Data is not grouped by a major key.

major-key(key specifier, optional)

  • This parameter is available only when the grouped-input parameter is set to Data is grouped by a major key. Specifies a field or set of fields by which the input data is sorted or grouped. 

 check-sort(boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to True and the key-method parameter is set to Use key specifier.

  • This parameter indicates whether the component should fail when it first encounters an input record that is out of sorted order. Setting this parameter to False effectively treats every key change as a change in group.

Default is True.
 

reject-threshold(choice, required)

  • Specifies the component’s tolerance for reject event

Ab Initio Component | SCAN:Part 1

 Purpose of SCAN

  • For every input record, SCAN generates an output record that consists of a running cumulative summary for the group to which the input record belongs, up to and including the current record 
  •  SCAN is similar to ROLLUP. The difference between the two is that SCAN produces one output record for each input record, while ROLLUP produces one output record for each key group 

Two modes to use SCAN 

Unlike ROLLUP SCAN can also be used in template mode and expanded mode

  • Template mode — You define a simple scan function that typically includes aggregation functions.

  • Expanded mode — You create a transform using an expanded scan package. This mode allows for scans that do not necessarily use regular aggregation functions. 

Parameters for SCAN (Not all Parameters are covered)

 

sorted-input(boolean, required)
  • This parameter specifies whether the component accepts unsorted (or ungrouped) input.
  • If you want to process ungrouped input/data, set this parameter to False.
Default is True.
 
key-method(choice, optional)

  • This parameter is defines method by which the component determines the boundary between one group of records and the next. The choices are as follows:
  • Use key specifier — The component uses one or more of the fields in the input record as the grouping key.
  • Use key_change function — Instead of using fields from the input record to group the input, the component uses the key_change transform function to determine when a new group begins. 

 key(key specifier, required when key-method is Use key specifier)

  • This parameter consists names of the key fields that the component can use to group or define groups of records. 
 
transform(filename or string, required)
  • This param consists of either the name of the file containing the types and transform functions, or a transform string. 
 
 max-core (integer, required)

  • This parameter define maximum memory usage in bytes.
  • This parameter is available only when the sorted-input parameter is set to False.
  • If the total size of the intermediate results that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.

Default is 67108864 bytes (64 MB).