Showing posts with label File component. Show all posts
Showing posts with label File component. Show all posts

Ab Initio Component | REDEFINE FORMAT

 Purpose of Redefine Format:

  • REDEFINE FORMAT copies records from its in port to its out port without changing the values in the records.

  • It can also be used to improve the graph performance by reducing number of fields in the input records - by renaming the fields

There are no parameters in Components

 Runtime behavior of REDEFINE FORMAT

  • Reads the records arriving at the in port.
  • Writes the records to the out port with the fields renamed according to the record format of the out port.

 

REDEFINE FORMAT is designed not to support implicit reformat, so that you can use it to change the record format associated with a particular flow.
To do this, you have to assign a record format to the out port different from the record format on the in port.

If you use REDEFINE FORMAT to change a record format, then you have to make sure you specify an output record format compatible with the input record format. For example, if you combine several fields into one, that one field must have the same number of bytes as the total of the original several fields .See the below example for more details:

Suppose the input record format is:

record
   string(10)    first_name;
   string(10)    last_name;
   string(30)    address;
   decimal(5)    zipcode;
   decimal(8.2)  income;
end;

You can reduce the number of fields by specifying an output record format of:

record
   string(55)   person_info;
   decimal(8.2) income;
end;

Ab Initio Component | REFORMAT: Part 1

 Purpose of Reformat:

  • Reformat Component is used to change the format of records by dropping fields, or by using DML expressions to add fields, combine fields, or transform the data in the records.
  •  REFORMAT performs an implicit reformat when you do not define a reformat function or transformation for the fields

 

Parameters for REFORMAT

 

 count (integer, required)

     It is used to sets the number of:
  •         out ports
  •         reject ports
  •         error ports
  •         transform parameters
    Default is 1. 
 

select (expression, optional)

  •  It is used to filter the records before reformatting 
 

error_group (string, optional)

  •  It is name of the error group to which this component belongs. It sends its error output to the HANDLE ERRORS component with a matching error_group value.

 

log_group (string, optional)

  •  It is name of the log group to which this component belongs. It sends its log output to the HANDLE LOGS component with a matching log_group value.

reject-threshold (choice, required)

  • It is used to specifies the component’s tolerance for reject events.
  •  The reject-threshold parameter specifies the component’s tolerance for reject events. Choose one of the following       
  1. Abort on first reject — The component stops execution of the graph at the first reject event it generates.
  2. Never abort — The component does not stop execution of the graph no matter how many reject events it generates.
  3. Use limit/ramp — The component uses the settings in the limit and ramp parameters to determine how many reject events to allow before it stops execution of the graph.

 limit(integer, required)

  • A number representing reject events.
  • When the reject-threshold parameter is set to Use limit/ramp, the component uses the values of the ramp and limit parameters in a formula to determine the component’s tolerance for reject events. 
       Default is 0.  

ramp(real, required)

  • Rate of toleration of reject events in the number of records processed.
  • When the reject-threshold parameter is set to Use limit/ramp, the component uses the values of the ramp and limit parameters in a formula to determine the component’s tolerance for reject events.
    Default is 0.0.

Note:

When you set the reject-threshold parameter of a component to Use limit/ramp, the limit and ramp parameters become available. The component then uses the limit and ramp parameters together in a formula to control the component’s tolerance for reject events:

The component stops execution of the graph if the number of reject events exceeds the result of the following formula:

limit + (ramp *  number_of_records_processed_so_far)


Ab Initio Component | OUTPUT FILE

 Purpose of Output File : 

  •  OUTPUT FILE represents records written as output from a graph into one or more serial files or a multifile.
  • OUTPUT FILE can also be used to write files directly to Amazon S3 and Google Cloud Storage.
  •  OUTPUT FILE cannot be used in continuous graphs and only be used in batch graph.

Parameter of Output FILE 


1. Data Tab:

Use the Data tab to specify the following:

  • The path to a file reusable dataset
  • The physical location for a data file
  • If appropriate, an alternative means to associate a specified data file with an EME dataset in the EME Technical Repository

Reusable dataset

  • Specifies the use and location of a file reusable dataset that is preconfigured to access a particular set of data. Using this option configures the component as a dataset-linked component. For more information, see “Reusable datasets” in the Co>Operating System Graph Developer’s Guide.
    Reuse an existing dataset .

Data location 

    Specifies the data location as:

  • The URL of a serial file or of a multifile in a multifile system
  • The URLs of the individual partitions of an ad hoc multifile

File details

Opens a window, where you can see the following information about the file that corresponds to the specified data location:

  •     Permissions on the file
  •     Owner of the file
  •     Size of the file in bytes
  •     Date and time the file was last modified
  •     Full pathname of the file
  •     Any resolution errors 
 
 

2. Access Tab:

Below are the option available for File handling in Access Tab

  1. If the file does not exist  Create file : Creates the output or intermediate file before writing to it.

    By default, this option is selected.
  2. If the file does not exist Fail : Forces the graph to fail if the file does not exist. 
  3. If the file exists Delete and recreate file :Deletes the output or intermediate file and creates a new one before writing to it.By default, this option is selected.
  4. If the file exists Append to file: Writes output to the end of the intermediate or output file. 
  5. If the file exists Fail : Forces the graph to fail if the file exists.
  6. Upon job failure, roll the file back to the last checkpoint : Rolls the file back and discards output if the job fails in the phase writing the file, or fails in a subsequent phase before the next checkpoint.

    By default, this option is selected.
  7. Delete file after the last phase that reads it completes :Removes the input or intermediate file after the last phase that reads it has finished running.

    By default, this option is not selected.
  8. Write file only when phase completes : Instead of writing the data file incrementally, writes the file when the phase has run to completion. This ensures that a separate process that is looking for the file while the graph is running does not pick up a partially written file.

    By default, this option is not selected.

3. File protection

  • Sets permissions to the input, output, and intermediate files. (Default settings are those assigned at file creation.) The checkboxes match the Unix file protection standards: Read (R), Write (W), and Execute (X) for User, Group, and Other. 

4. Ports

  • Used for providing the DML of the file which can be used to map data in the file
 

Ab Initio Component | INPUT FILE

 Purpose of Input File :

  • INPUT FILE represents records read as input to a graph from one or more serial files or from a multi-file.
  • INPUT FILE can also be used to read the files from Hadoop file system,amazon S3 and google cloud storage
  • INPUT FILE is not a phased component
  • INPUT FILE can be used only in batch graph and cannot be used in continuous flow graph.

 

Parameter of INPUT FILE 

 

1. Data Tab:

Use the Data tab to specify the following:

  • The path to a file reusable dataset
  • The physical location for a data file
  • If appropriate, an alternative means to associate a specified data file with an EME dataset in the EME Technical Repository

Reusable dataset

  • Specifies the use and location of a file reusable dataset that is preconfigured to access a particular set of data. Using this option configures the component as a dataset-linked component. For more information, see “Reusable datasets” in the Co>Operating System Graph Developer’s Guide.
    Reuse an existing dataset .

Data location 

    Specifies the data location as:

  • The URL of a serial file or of a multifile in a multifile system
  • The URLs of the individual partitions of an ad hoc multifile

File details

Opens a window, where you can see the following information about the file that corresponds to the specified data location:

  •     Permissions on the file
  •     Owner of the file
  •     Size of the file in bytes
  •     Date and time the file was last modified
  •     Full pathname of the file
  •     Any resolution errors 
 
 

2. Access Tab:

Below are the option available for File handling in Access Tab

  1. If the file does not exist  Create file : Creates the output or intermediate file before writing to it.

    By default, this option is selected.
  2. If the file does not exist Fail : Forces the graph to fail if the file does not exist. 
  3. If the file exists Delete and recreate file :Deletes the output or intermediate file and creates a new one before writing to it.By default, this option is selected.
  4. If the file exists Append to file: Writes output to the end of the intermediate or output file. 
  5. If the file exists Fail : Forces the graph to fail if the file exists.
  6. Upon job failure, roll the file back to the last checkpoint : Rolls the file back and discards output if the job fails in the phase writing the file, or fails in a subsequent phase before the next checkpoint.

    By default, this option is selected.
  7. Delete file after the last phase that reads it completes :Removes the input or intermediate file after the last phase that reads it has finished running.

    By default, this option is not selected.
  8. Write file only when phase completes : Instead of writing the data file incrementally, writes the file when the phase has run to completion. This ensures that a separate process that is looking for the file while the graph is running does not pick up a partially written file.

    By default, this option is not selected.

3. File protection

  • Sets permissions to the input, output, and intermediate files. (Default settings are those assigned at file creation.) The checkboxes match the Unix file protection standards: Read (R), Write (W), and Execute (X) for User, Group, and Other. 

4. Ports

  • Used for providing the DML of the file which can be used to map data in the file