Showing posts with label Join. Show all posts
Showing posts with label Join. Show all posts

Ab Initio Component | ROLLUP : Part 3

 ....Continue from Part 2....

 

Function used in expanded mode 

  • Expanded mode provides more control over the transform. It lets you edit the expanded package, so you can specify transformations that are not possible with template mode 

  • With an expanded ROLLUP package, you must define the following function in it:

  • DML type named temporary_type

  • initialize function that returns a temporary_type record

  • rollup function that takes two input arguments (an input record and a temporary_type record) and returns an updated temporary_type record

  • finalize function that returns an output record

     

Runtime behavior of ROLLUP 

ROLLUP perform following operation for each group of records: 

1. Performing Input selection:

  • If you have not defined the input_select function in your transform, ROLLUP processes all records.

  • If you have defined the input_select function, ROLLUP filters the input records accordingly.

2. Performing Key change (for sorted input only):

  • For every record except the first, ROLLUP checks whether a key change has occurred:

  • ROLLUP compares the current record’s key value to the previous record’s key value, unless the key_change function is defined.

  • If the key_change function is defined, ROLLUP calls that function to check for a key change.

3. Temporary initialization:

  • ROLLUP passes the first record in each group to the initialize transform function.

4. Performing Computation:

  • ROLLUP calls the rollup transform function for each input record.
  • The input to the rollup transform function is the input record and the temporary record for the group to which the input record belongs.
  • The rollup transform function returns an updated temporary record for that input group. 

5. Performing Finalization of  the output:

With sorted-input set to True:

  • ROLLUP calls the finalize transform function after it processes all the input records in each group.

  • ROLLUP passes the temporary record for the group and the last input record in the group to the finalize transform function.

  • The finalize transform function produces an output record for the group.

Note:

  • For sorted-input set to False  ROLLUP processes all the input records, it calls the finalize transform function with the temporary record for each group and an arbitrary input record from each group as arguments.

  • ROLLUP repeats this procedure with each group.

  • The finalize transform function then produces an output record for each group.

  • The component stops the execution of the graph when the number of reject events exceeds the result of the following formula:

limit+(ramp* number_of_records_processed_so_far)

6. Output selection:

  • If you have defined the output_select transform function, it filters the output records.

Ab Initio Component | ROLLUP : Part 2

 ...Continue from part 1......

max-core(integer, required)

  • This parameter define maximum memory usage in bytes.

  • It is available only when the sorted-input parameter is set to False.

  • If the total size of the intermediate results that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.

Default is 67108864 (64 MB).

reject-threshold(choice, required)

  • Specifies the component’s tolerance for reject events i.e after how many reject records the component should abort its operation

check-sort(boolean, optional)

  • This parameter is available only when the sorted-input parameter is set to True and the key-method parameter is set to Use key specifier.

 Difference between using unsorted and sorted data

With unsorted data

  • When the input data is not sorted (and the sorted-input parameter is set to False), the function outputs an arbitrary record from each group. This might not be particularly useful.
  • To get the first or last record in the unsorted data, you can use the first or last aggregation function.

With sorted data

  • When the input data is sorted (and the sorted-input parameter is set to True), the function outputs the last record from each group.
  • In this case, the function is equivalent to the following, which uses the last aggregation function 


Ab Initio Component | ROLLUP : Part 1

 Purpose of ROLLUP

  • ROLLUP is used to process groups of input records that have the same key, generating one output record for each group. 

  • Typically, the output record is summary or aggregates the data in some way; for example, a simple ROLLUP can be used to calculate a sum or average of one or more input fields.

  • ROLLUP can also be used to select certain information from each group; for example, it might output the largest value in a field, or accumulate a vector of values that conform to specific criteria.

Two modes to use ROLLUP

You can use a ROLLUP component in two modes, depending on how you define the transform parameter:

1. Template mode — You define a simple rollup function that may include aggregation functions. Template mode is the most common/simple way to use ROLLUP.

2. Expanded mode — You create a transformation using an expanded rollup package. This mode allows for rollups that do not necessarily use regular aggregation functions.


Parameters for ROLLUP (Not all parameters are covered.)

 sorted-input(boolean, required)

  • This parameter to specifies whether the component accepts unsorted (or ungrouped) input.

  • If you want to process ungrouped input, set this parameter to False.

Default is True.

key-method (choice, optional)

  • This parameter determines the method by which the component determines the boundary between one group of records and the next. The choices are as follows:

1. Use key specifier — The component uses one or more of the fields in the input record as the grouping key.

2. Use key_change function — Instead of using fields from the input record to group the input, the component uses the key_change transform function to determine when a new group begins.


Default is Use key specifier.

key(key specifier, required when key-method is Use key specifier)

  • This parameter contain the name(s) of the key fields that the component can use to group or define groups of records.

 transformp(filename or string, required)

  • This parameter contains either the name of the file containing the types and transform functions, or a transform string.

output_without_input(choice, optional)

  • This parameter specifies the event that, when received, triggers the component to call the output_without_input function, if no input records have been received since the last such event or since the component started. The choices are as follows:

Never — The function will not be called.

At each computepoint — The function is called at each computepoint event.

At each checkpoint — The function is called at each checkpoint event.

At component shutdown — The function is called when the component is shutdown.

Default is Never.

Ab Initio Component | JOIN: Part 3

 .....Continue from part 2.....


maintain-order  (boolean, required)

  • Set to True to ensure that records remain in the original order of the driving input. (The driving input is the largest input, as specified by the driving parameter.)
  • Available only when the sorted-input parameter is set to False. If the sorted-input parameter is set to True and all inputs are sorted on the fields given in the key parameter, the output maintains the sort order on that key without the use of this parameter.
  • If any inputs other than the driving input are too large to fit within the memory limit specified by max-core, the behavior of the component depends on the setting of maintain-order:
  • False — The component stores some of its intermediate results in temporary files on disk. This alters the order of records in the driving input.
  •  True — The component stops execution of the graph.
   
    Default is False.


max-core (integer, required)

  • Maximum memory usage in bytes. Available only when the sorted-input parameter is set to False.
  • If the total size of the non-driving inputs that the component holds in memory exceeds the number of bytes specified in the max-core parameter, the component writes temporary files to disk.
    Default value is 67108864 bytes (64 MB). 


Runtime behavior of JOIN

 

 JOIN performs following Operations:
 
 1. Reads data records from multiple inn ports. Depending on the setting of the sorted-input parameter, it does one of the following:

  •  If input is sorted, it reads records in the order in which they arrive. 
  •  In input is unsorted, it loads all records from all inputs except the driving input into main memory. Once the non-driving inputs are loaded, it reads records from the driving input in the order in which they arrive. 

2. Applies the expression in any defined selectn parameter to the records on the corresponding inn port:

  • If the value of select expression evaluates to 0 for a record the join components does not process the record, and the record does not appear on any output port

  • Evaluates to anything other than 0 or NULL for a particular record    Processes the record
  • If you do not supply an expression for a selectn parameter, JOIN processes all the records on the corresponding inn port
 
3. Removes any duplicate records that have made it through the select if dedupn parameter to True. 

4.Operates on records that have matching key values using a multi-input transform function.

If the transform function returns NULL, then JOIN:
  • Writes each input record to the corresponding rejectn port, then stops execution of the graph when the number of reject events exceeds the result of the following formula:

        limit + (ramp * number_of_records_processed_so_far)
  • Writes an error message to the corresponding errorn port.If no flows are connected to rejectn or errorn ports, JOIN component discards the information
5. Writes the non-NULL return record from the transform function to the out port. 

 






Ab Initio Component | JOIN: Part 2

 .....Continue from part 1....

record-requiredn (boolean, required)

  • This parameter is available only when the parameter-interface parameter is set to legacy (or in a pre-Version 3.2.1 JOIN component) and the join-type parameter is set to Explicit.

    The default is True. 

record-match-requiredn (boolean, required)

  • This parameter is available only when the join-type parameter is set to Explicit.

  • It  is used to specify whether a record is required or whether to substitute a null for a missing record.

    The default is True.     

    To use this parameter, note the following points:

  •  When there are two inputs, set record-match-requiredn to True on the input port for which you want to call the transform for every record, regardless of whether there is a matching record on the other input port.

  •  When there are more than two inputs, set record-match-requiredn to True when you want to call the transform only when there are records with matching keys on all input ports for which record-match-requiredn is True.

 dedupn(boolean, required)

  • Set the dedupn parameter to Dedup this input before joining to remove duplicates from the corresponding inn port before joining. This allows you to choose only one record from a group with matching key values as the argument to the transform function.

  • There is one dedupn parameter associated with each inn port. Unused duplicates are sent to the unusedn port.

    Default is Do not dedup this input.

selectn  (expression, optional)

  • Filters for records before a join function. One per inn port; n represents the number of an in port. If you use selectn with dedupn, the JOIN component performs the select first, then removes the duplicate records that made it through the select. 

 max-memory (integer, required)

  • Maximum memory usage in bytes before the component writes temporary files to disk. Available only when the sorted-input parameter is set to True.

    The default value is 8388608 bytes (8 MB).
    

    
check-sort  (boolean, required)

  • Available only when the sorted-input parameter is set to True.

  • If set to True, stops the graph on the first input record that is out of sorted order (according to the key). Available when the sorted-input parameter is set to True.

  • The default is False. In this case, JOIN does not necessarily stop or issue an error when it encounters unsorted inputs. If sorted input is a requirement, set check-sort to True.

 

 

Ab Initio Component | JOIN : Part 1

 Purpose of JOIN Components

  • JOIN  is used to reads data from two or more input ports, combines records with matching keys according to the transform you specify, and sends the transformed records to the output port.
  •  Its additional ports caln also be used to collect rejected and unused records.  

 

Parameters for JOIN (Not all parameters are covered.)


count (integer, required)

  • It is an integer n specifying the total number of inputs (in ports) to join. The number of input ports also determines the number of the following ports and parameters:

        unused ports

        reject ports

        error ports

        record-match-required parameters

        dedup parameters

        select parameters

        override-key parameters

    Default is 2.

    Each in port (always two or more) has a number n appended. Each outn, unusedn, rejectn, and errorn port corresponds to an inn port.
 
 
sorted-input (boolean, required)

  • When this parameter is set to False, the component accepts unsorted input and permits the use of the maintain-order parameter.
  • When this parameter is set to True, the component requires sorted input .In this case, consider setting the check-sort parameter to True.
    Default is True. 

key(key specifier, required)
 
  • Name(s) of the field(s) in the input records that must have matching values for JOIN to call the transform function. The types of the fields in the different inputs must be compatible; 
 
transform (filename or string, required)

  • Either the name of the file containing the transform function, or a transform string. 
 join-type (choice, required)

    You have to  choose one of the option  from the following:

  • Inner join (default) — Sets the record-match-requiredn parameters for all ports to True. The GDE does not display the record-match-requiredn parameters, because they all have the same value.
  •  Outer join — Sets the record-match-requiredn parameters for all ports to False. The GDE does not display the record-match-requiredn parameters, because they all have the same value.
  •  Explicit — Allows you to set the record-match-requiredn parameter for each port individually.

    If you set the dedupn parameter to True on the driving input, set the join-type parameter to Inner join. (The driving input is the largest input, as specified by the driving parameter.)

    If you remove duplicates on this input port before joining it to the driving input, set the record-match-requiredn parameter to True on all other ports.
 
 
parameter-interface (choice, required)

  • This parameter is available only after you update a pre-Version 3.2.1 JOIN component to Version 3.2.2 or higher. It is not available for new components.
  • Controls whether to use a legacy or improved parameter interface. The choices are the following:

  •  legacy — Displays the record-requiredn parameter whose boolean value specifies whether to use an inner or outer join and whether a record is required or substitute a null for missing records. This parameter has inverted booleans. The default for pre-Version 3.2.1 components.
  • version-3-2-2 — Displays the record-match-requiredn parameter whose boolean value specifies whether to use an inner or outer join. This parameter has normal booleans