Showing posts with label max core. Show all posts
Showing posts with label max core. Show all posts

Ab Initio Component | SCAN:Part 3

 ....Continue from Part 2.....

Runtime behavior of SCAN

 

SCAN perform following operation for each group of records:

 

1. Performing Input selection:

  • If you have defined the input_select function, SCAN filters the input records accordingly.

  • However if you have not defined the input_select function in your transform, SCAN processes all records.

2. Performing Key change (for sorted input only):

  • For every record except the first, SCAN checks whether a key change has occurred:

  • SCAN compares the current record’s key value to the previous record’s key value, unless the key_change function is defined.

  • If the key_change function is defined, SCAN calls that function to check for a key change.

3. Performing Temporary initialization:

  • SCAN passes the first record in each group to the initialize transform function.

4. Performing Computation:

  • SCAN calls the scan transform function for each record in a group, including the first, using the input record and the temporary record for the group to which the input record belongs. The scan transform function returns a new temporary record.

5. Finalizing the output:

  • SCAN calls the finalize transform function once for every input record. SCAN passes the input record and the temporary record that the scan function returned to the finalize transform function. The finalize transform function produces an output record for each input record.

  • SCAN stops execution of the graph when the number of reject events exceeds the result of the following formula:

           limit+(ramp* number_of_records_processed_so_far)

6. Output selection:

  • If you have defined the output_select transform function, SCAN filters the output records.

Ab Initio Component | SCAN:Part 2

 .....Continue from Part 2.....


maintain-order(boolean, required)

  • This parameter is available only when the sorted-input parameter is set to False.

  • When the input is too large to fit within the memory limit specified by max-core, the maintain-order parameter, when set to True, stops the graph, ensuring that records are not reordered.

  • When the parameter is set to False (the default), the component stores some of its intermediate results in temporary files on disk. This alters the order of records.

Default is False.

grouped-input (boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to False.

  • Set this parameter to Data is grouped by a major key in order to specify the major-key by which the input is sorted or grouped. In this case, the key parameter becomes the minor key: it is the field (or fields) to be scanned.

  • When you specify a major key, SCAN is more efficient in its use of memory: SCAN clears its in-memory table of intermediate results at the end of each major-key group of input records.

Default is Data is not grouped by a major key.

major-key(key specifier, optional)

  • This parameter is available only when the grouped-input parameter is set to Data is grouped by a major key. Specifies a field or set of fields by which the input data is sorted or grouped. 

 check-sort(boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to True and the key-method parameter is set to Use key specifier.

  • This parameter indicates whether the component should fail when it first encounters an input record that is out of sorted order. Setting this parameter to False effectively treats every key change as a change in group.

Default is True.
 

reject-threshold(choice, required)

  • Specifies the component’s tolerance for reject event

Ab Initio Component | SCAN:Part 1

 Purpose of SCAN

  • For every input record, SCAN generates an output record that consists of a running cumulative summary for the group to which the input record belongs, up to and including the current record 
  •  SCAN is similar to ROLLUP. The difference between the two is that SCAN produces one output record for each input record, while ROLLUP produces one output record for each key group 

Two modes to use SCAN 

Unlike ROLLUP SCAN can also be used in template mode and expanded mode

  • Template mode — You define a simple scan function that typically includes aggregation functions.

  • Expanded mode — You create a transform using an expanded scan package. This mode allows for scans that do not necessarily use regular aggregation functions. 

Parameters for SCAN (Not all Parameters are covered)

 

sorted-input(boolean, required)
  • This parameter specifies whether the component accepts unsorted (or ungrouped) input.
  • If you want to process ungrouped input/data, set this parameter to False.
Default is True.
 
key-method(choice, optional)

  • This parameter is defines method by which the component determines the boundary between one group of records and the next. The choices are as follows:
  • Use key specifier — The component uses one or more of the fields in the input record as the grouping key.
  • Use key_change function — Instead of using fields from the input record to group the input, the component uses the key_change transform function to determine when a new group begins. 

 key(key specifier, required when key-method is Use key specifier)

  • This parameter consists names of the key fields that the component can use to group or define groups of records. 
 
transform(filename or string, required)
  • This param consists of either the name of the file containing the types and transform functions, or a transform string. 
 
 max-core (integer, required)

  • This parameter define maximum memory usage in bytes.
  • This parameter is available only when the sorted-input parameter is set to False.
  • If the total size of the intermediate results that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.

Default is 67108864 bytes (64 MB).

 

Ab Initio Component | ROLLUP : Part 3

 ....Continue from Part 2....

 

Function used in expanded mode 

  • Expanded mode provides more control over the transform. It lets you edit the expanded package, so you can specify transformations that are not possible with template mode 

  • With an expanded ROLLUP package, you must define the following function in it:

  • DML type named temporary_type

  • initialize function that returns a temporary_type record

  • rollup function that takes two input arguments (an input record and a temporary_type record) and returns an updated temporary_type record

  • finalize function that returns an output record

     

Runtime behavior of ROLLUP 

ROLLUP perform following operation for each group of records: 

1. Performing Input selection:

  • If you have not defined the input_select function in your transform, ROLLUP processes all records.

  • If you have defined the input_select function, ROLLUP filters the input records accordingly.

2. Performing Key change (for sorted input only):

  • For every record except the first, ROLLUP checks whether a key change has occurred:

  • ROLLUP compares the current record’s key value to the previous record’s key value, unless the key_change function is defined.

  • If the key_change function is defined, ROLLUP calls that function to check for a key change.

3. Temporary initialization:

  • ROLLUP passes the first record in each group to the initialize transform function.

4. Performing Computation:

  • ROLLUP calls the rollup transform function for each input record.
  • The input to the rollup transform function is the input record and the temporary record for the group to which the input record belongs.
  • The rollup transform function returns an updated temporary record for that input group. 

5. Performing Finalization of  the output:

With sorted-input set to True:

  • ROLLUP calls the finalize transform function after it processes all the input records in each group.

  • ROLLUP passes the temporary record for the group and the last input record in the group to the finalize transform function.

  • The finalize transform function produces an output record for the group.

Note:

  • For sorted-input set to False  ROLLUP processes all the input records, it calls the finalize transform function with the temporary record for each group and an arbitrary input record from each group as arguments.

  • ROLLUP repeats this procedure with each group.

  • The finalize transform function then produces an output record for each group.

  • The component stops the execution of the graph when the number of reject events exceeds the result of the following formula:

limit+(ramp* number_of_records_processed_so_far)

6. Output selection:

  • If you have defined the output_select transform function, it filters the output records.

Ab Initio Component | ROLLUP : Part 2

 ...Continue from part 1......

max-core(integer, required)

  • This parameter define maximum memory usage in bytes.

  • It is available only when the sorted-input parameter is set to False.

  • If the total size of the intermediate results that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.

Default is 67108864 (64 MB).

reject-threshold(choice, required)

  • Specifies the component’s tolerance for reject events i.e after how many reject records the component should abort its operation

check-sort(boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to True and the key-method parameter is set to Use key specifier.

 Difference between using unsorted and sorted data

With unsorted data

  • When the input data is not sorted (and the sorted-input parameter is set to False), the function outputs an arbitrary record from each group. This might not be particularly useful.
  • To get the first or last record in the unsorted data, you can use the first or last aggregation function.

With sorted data

  • When the input data is sorted (and the sorted-input parameter is set to True), the function outputs the last record from each group.
  • In this case, the function is equivalent to the following, which uses the last aggregation function 


Ab Initio Component | ROLLUP : Part 1

 Purpose of ROLLUP

  • ROLLUP is used to process groups of input records that have the same key, generating one output record for each group. 

  • Typically, the output record is summary or aggregates the data in some way; for example, a simple ROLLUP can be used to calculate a sum or average of one or more input fields.

  • ROLLUP can also be used to select certain information from each group; for example, it might output the largest value in a field, or accumulate a vector of values that conform to specific criteria.

Two modes to use ROLLUP

You can use a ROLLUP component in two modes, depending on how you define the transform parameter:

1. Template mode — You define a simple rollup function that may include aggregation functions. Template mode is the most common/simple way to use ROLLUP.

2. Expanded mode — You create a transformation using an expanded rollup package. This mode allows for rollups that do not necessarily use regular aggregation functions.


Parameters for ROLLUP (Not all parameters are covered.)

 sorted-input(boolean, required)

  • This parameter to specifies whether the component accepts unsorted (or ungrouped) input.

  • If you want to process ungrouped input, set this parameter to False.

Default is True.

key-method (choice, optional)

  • This parameter determines the method by which the component determines the boundary between one group of records and the next. The choices are as follows:

1. Use key specifier — The component uses one or more of the fields in the input record as the grouping key.

2. Use key_change function — Instead of using fields from the input record to group the input, the component uses the key_change transform function to determine when a new group begins.


Default is Use key specifier.

key(key specifier, required when key-method is Use key specifier)

  • This parameter contain the name(s) of the key fields that the component can use to group or define groups of records.

 transformp(filename or string, required)

  • This parameter contains either the name of the file containing the types and transform functions, or a transform string.

output_without_input(choice, optional)

  • This parameter specifies the event that, when received, triggers the component to call the output_without_input function, if no input records have been received since the last such event or since the component started. The choices are as follows:

Never — The function will not be called.

At each computepoint — The function is called at each computepoint event.

At each checkpoint — The function is called at each checkpoint event.

At component shutdown — The function is called when the component is shutdown.

Default is Never.

Ab Initio Component | JOIN: Part 3

 .....Continue from part 2.....


maintain-order  (boolean, required)

  • Set to True to ensure that records remain in the original order of the driving input. (The driving input is the largest input, as specified by the driving parameter.)
  • Available only when the sorted-input parameter is set to False. If the sorted-input parameter is set to True and all inputs are sorted on the fields given in the key parameter, the output maintains the sort order on that key without the use of this parameter.
  • If any inputs other than the driving input are too large to fit within the memory limit specified by max-core, the behavior of the component depends on the setting of maintain-order:
  • False — The component stores some of its intermediate results in temporary files on disk. This alters the order of records in the driving input.
  •  True — The component stops execution of the graph.
   
    Default is False.


max-core (integer, required)

  • Maximum memory usage in bytes. Available only when the sorted-input parameter is set to False.
  • If the total size of the non-driving inputs that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.
    Default value is 67108864 bytes (64 MB). 


Runtime behavior of JOIN

 

 JOIN performs following Operations:
 
 1. Reads data records from multiple inn ports. Depending on the setting of the sorted-input parameter, it does one of the following:

  •  If input is sorted, it reads records in the order in which they arrive. 
  •  In input is unsorted, it loads all records from all inputs except the driving input into main memory. Once the non-driving inputs are loaded, it reads records from the driving input in the order in which they arrive. 

2. Applies the expression in any defined selectn parameter to the records on the corresponding inn port:

  • If the value of select expression evaluates to 0 for a record the join components does not process the record, and the record does not appear on any output port

  • Evaluates to anything other than 0 or NULL for a particular record    Processes the record
  • If you do not supply an expression for a selectn parameter, JOIN processes all the records on the corresponding inn port
 
3. Removes any duplicate records that have made it through the select if dedupn parameter to True. 

4.Operates on records that have matching key values using a multi-input transform function.

If the transform function returns NULL, then JOIN:
  • Writes each input record to the corresponding rejectn port, then stops execution of the graph when the number of reject events exceeds the result of the following formula:

        limit + (ramp * number_of_records_processed_so_far)
  • Writes an error message to the corresponding errorn port.If no flows are connected to rejectn or errorn ports, JOIN component discards the information
5. Writes the non-NULL return record from the transform function to the out port. 

 






Ab Initio Component | JOIN: Part 2

 .....Continue from part 1....

record-requiredn (boolean, required)

  • This parameter is available only when the parameter-interface parameter is set to legacy (or in a pre-Version 3.2.1 JOIN component) and the join-type parameter is set to Explicit.

    The default is True. 

record-match-requiredn (boolean, required)

  • This parameter is available only when the join-type parameter is set to Explicit.

  • It  is used to specify whether a record is required or whether to substitute a null for a missing record.

    The default is True.     

    To use this parameter, note the following points:

  •  When there are two inputs, set record-match-requiredn to True on the input port for which you want to call the transform for every record, regardless of whether there is a matching record on the other input port.

  •  When there are more than two inputs, set record-match-requiredn to True when you want to call the transform only when there are records with matching keys on all input ports for which record-match-requiredn is True.

 dedupn(boolean, required)

  • Set the dedupn parameter to Dedup this input before joining to remove duplicates from the corresponding inn port before joining. This allows you to choose only one record from a group with matching key values as the argument to the transform function.

  • There is one dedupn parameter associated with each inn port. Unused duplicates are sent to the unusedn port.

    Default is Do not dedup this input.

selectn  (expression, optional)

  • Filters for records before a join function. One per inn port; n represents the number of an in port. If you use selectn with dedupn, the JOIN component performs the select first, then removes the duplicate records that made it through the select. 

 max-memory (integer, required)

  • Maximum memory usage in bytes before the component writes temporary files to disk. Available only when the sorted-input parameter is set to True.

    The default value is 8388608 bytes (8 MB).
    

    
check-sort  (boolean, required)

  • Available only when the sorted-input parameter is set to True.

  • If set to True, stops the graph on the first input record that is out of sorted order (according to the key). Available when the sorted-input parameter is set to True.

  • The default is False. In this case, JOIN does not necessarily stop or issue an error when it encounters unsorted inputs. If sorted input is a requirement, set check-sort to True.

 

 

Ab Initio Component | JOIN : Part 1

 Purpose of JOIN Components

  • JOIN  is used to reads data from two or more input ports, combines records with matching keys according to the transform you specify, and sends the transformed records to the output port.
  •  Its additional ports caln also be used to collect rejected and unused records.  

 

Parameters for JOIN (Not all parameters are covered.)


count (integer, required)

  • It is an integer n specifying the total number of inputs (in ports) to join. The number of input ports also determines the number of the following ports and parameters:

        unused ports

        reject ports

        error ports

        record-match-required parameters

        dedup parameters

        select parameters

        override-key parameters

    Default is 2.

    Each in port (always two or more) has a number n appended. Each outn, unusedn, rejectn, and errorn port corresponds to an inn port.
 
 
sorted-input (boolean, required)

  • When this parameter is set to False, the component accepts unsorted input and permits the use of the maintain-order parameter.
  • When this parameter is set to True, the component requires sorted input .In this case, consider setting the check-sort parameter to True.
    Default is True. 

key(key specifier, required)
 
  • Name(s) of the field(s) in the input records that must have matching values for JOIN to call the transform function. The types of the fields in the different inputs must be compatible; 
 
transform (filename or string, required)

  • Either the name of the file containing the transform function, or a transform string. 
 join-type (choice, required)

    You have to  choose one of the option  from the following:

  • Inner join (default) — Sets the record-match-requiredn parameters for all ports to True. The GDE does not display the record-match-requiredn parameters, because they all have the same value.
  •  Outer join — Sets the record-match-requiredn parameters for all ports to False. The GDE does not display the record-match-requiredn parameters, because they all have the same value.
  •  Explicit — Allows you to set the record-match-requiredn parameter for each port individually.

    If you set the dedupn parameter to True on the driving input, set the join-type parameter to Inner join. (The driving input is the largest input, as specified by the driving parameter.)

    If you remove duplicates on this input port before joining it to the driving input, set the record-match-requiredn parameter to True on all other ports.
 
 
parameter-interface (choice, required)

  • This parameter is available only after you update a pre-Version 3.2.1 JOIN component to Version 3.2.2 or higher. It is not available for new components.
  • Controls whether to use a legacy or improved parameter interface. The choices are the following:

  •  legacy — Displays the record-requiredn parameter whose boolean value specifies whether to use an inner or outer join and whether a record is required or substitute a null for missing records. This parameter has inverted booleans. The default for pre-Version 3.2.1 components.
  • version-3-2-2 — Displays the record-match-requiredn parameter whose boolean value specifies whether to use an inner or outer join. This parameter has normal booleans 

 

 
 
 
 

Ab Initio Component | DEDUP SORTED : Part 2

 ...Continue from Part 1...


check-sort (boolean, optional)

  • Defines whether you want processing to abort on the first record that is out of sorted order. True causes processing to abort on the first record out of order.

    Default is True.

logging (boolean, optional)

  • Defines whether the component logs certain events.

    Default is False.

log_input (choice, optional)

  •  Defines how often the component sends an input record to its log port. Available only when the logging parameter is set to True.


log_output     (choice, optional)

  •  Defines how  often the component sends an output record to its log port. Available only when the logging parameter is set to True.


log_reject (choice, optional)

  • Defines how  often the component sends a reject record to its log port. Available only when the logging parameter is set to True.

    

Runtime behavior of DEDUP SORTED

    
DEDUP SORTED performs following operations:

   1.  Reads a grouped flow of records from the in port.

    2.  Does one of the following if a select expression is specified:

 

  •  If the expression  values evaluates to 0 for a particular record then it does not process the record.
  • Produces NULL for a particular record then it writes the record to the reject port and writes a descriptive error message to the error port.

  • Evaluates to anything other than 0 or NULL for a particular record    Processes the record.

  • If you do not supply an expression for the select parameter, DEDUP SORTED processes all records on the in port.


    3. Processes groups of records as follows:

  • It considers any consecutive records with the same key value to be in the same group.

  • If a group consists of one record, writes that record to the out port.

  •  If a group consists of more than one record, uses the value of the keep parameter to determine which record — if any — to write to the out port, and which record or records to write to the dup port.

  •  If you have chosen unique-only for the keep parameter, does not write records to the out port from any groups consisting of more than one record.

Ab Initio Component | DEDUP SORTED: Part 1

 Purpose of DEDUP SORTED

  • DEDUP SORTED is used to separate one specified record in each group of records from the rest of the records in the group.

 

Parameters for DEDUP SORTED 

 
key (key specifier, required)
  •  Name of the key/(s) field you want the component to use when determining groups of data records.

select (expression, optional)

  • Provide the expresion to filters/select only those records accordingly before the component separates duplicates. 
 
Keep (choice, required)

  • It specifies which records the component keeps to write to the out port. You have to set one of the following options:

        first — Keeps the first record of a group

        last — Keeps the last record of a group

        unique-only — Keeps only records with unique key values

    The component writes the remaining records of each group to the dup port.

    Default is first.
    
package (transform, optional)

  •  Allows you to define this component’s log- and error-handling functions.
    
error_group  (string, optional)

  • Defines name of the error group to which this component belongs. The component sends its error output to the HANDLE ERRORS component with a matching error_group value.

log_group  (string, optional)


  •  Defines name of the log group to which this component belongs. The component sends its log output to the HANDLE LOGS component with a matching log_group value.

reject-threshold (choice, optional)

  • The component’s tolerance for reject events.

limit (integer, optional)

  •  A number representing reject events.
  • When the reject-threshold parameter is set to Use limit/ramp, the component uses the values of the ramp and limit parameters in a formula to determine the component’s tolerance for reject events.
    Default is 0.

ramp (real, optional)

  •  Rate of tolerance for reject events in the number of records processed.
  • When the reject-threshold parameter is set to Use limit/ramp, the component uses the values of the ramp and limit parameters in a formula to determine the component’s tolerance for reject events.
    Default is 0.0.

Note:

  • When you set the reject-threshold parameter of a component to Use limit/ramp, the limit and ramp parameters become available. The component then uses the limit and ramp parameters together in a formula to control the component’s tolerance for reject events:
  • The component stops execution of the graph if the number of reject events exceeds the result of the following formula:

limit + (ramp *  number_of_records_processed_so_far)

Ab Initio Component | SORT

 Purpose of SORT Components

 
  • SORT  components sorts and merges records. You can use it to order records before you send them to a component that requires grouped or sorted records. 
  • The SORT components accepts the data from in component in fan-in and all-to-all  flow (for partitioned data). As SORT perform gather operation on its in port so there is no need to gather data before sending to SORT 
  •  Stability in SORT is not guaranteed as the records with identical key values may not maintain their relative order after being sorted. 

 Note:

  • If the key/s on which sorts is being performed contains NULL values, the NULL records are listed first with ascending sort order and last with descending sort order. 
 

Parameters for SORT 

 
key (key specifier, required) 
 
  • Name of the key(s) field(s) and the sequence specifier(s) you want the component to use when it orders records. 
 
 max-core (integer, required)
 
  • You can set this paramter to Maximum memory usage in bytes.
    Default is 100663296 (96 MB). 


Runtime behavior of SORT

 SORT Components perform the following operation:
 
1. SORT Component reads the records from all flows connected to the in port and splits it into temporary files that are smaller in size than the number of bytes specified by the max-core parameter . 

2. Sorts the records in each temporary file according to the sort key.

3. SORT stores any temporary files in the working directories specified in its layout.

4. Repeats steps 1 and 2 until it has read all records.
 
5. Merges all temporary files, maintaining the sort order. 
 
6. Writes the result to the out port.