Purpose of Input File :
- INPUT FILE represents records read as input to a graph from one or more serial files or from a multi-file.
- INPUT FILE can also be used to read the files from Hadoop file system,amazon S3 and google cloud storage
- INPUT FILE is not a phased component
- INPUT FILE can be used only in batch graph and cannot be used in continuous flow graph.
Parameter of INPUT FILE
1. Data Tab:
Use the Data tab to specify the following:
- The path to a file reusable dataset
- The physical location for a data file
- If appropriate, an alternative means to associate a specified data file with an EME dataset in the EME Technical Repository
Reusable dataset
- Specifies the use and location of a file reusable dataset that is preconfigured to access a particular set of data. Using this option configures the component as a dataset-linked component. For more information, see “Reusable datasets” in the Co>Operating System Graph Developer’s Guide.
Reuse an existing dataset .
Data location
Specifies the data location as:
- The URL of a serial file or of a multifile in a multifile system
- The URLs of the individual partitions of an ad hoc multifile
File details
Opens a window, where you can see the following information about the file that corresponds to the specified data location:
- Permissions on the file
- Owner of the file
- Size of the file in bytes
- Date and time the file was last modified
- Full pathname of the file
- Any resolution errors
2. Access Tab:
Below are the option available for File handling in Access Tab
- If the file does not exist Create file : Creates the output or intermediate file before writing to it.
By default, this option is selected. - If the file does not exist Fail : Forces the graph to fail if the file does not exist.
- If the file exists Delete and recreate file :Deletes the output or intermediate file and creates a new one before writing to it.By default, this option is selected.
- If the file exists Append to file: Writes output to the end of the intermediate or output file.
- If the file exists Fail : Forces the graph to fail if the file exists.
- Upon job failure, roll the file back to the last checkpoint : Rolls the file back and discards output if the job fails in the phase writing the file, or fails in a subsequent phase before the next checkpoint.
By default, this option is selected. - Delete file after the last phase that reads it completes :Removes the input or intermediate file after the last phase that reads it has finished running.
By default, this option is not selected. - Write file only when phase completes : Instead of writing the data file incrementally, writes the file when the phase has run to completion. This ensures that a separate process that is looking for the file while the graph is running does not pick up a partially written file.
By default, this option is not selected.
3. File protection
- Sets permissions to the input, output, and intermediate files. (Default settings are those assigned at file creation.) The checkboxes match the Unix file protection standards: Read (R), Write (W), and Execute (X) for User, Group, and Other.
4. Ports
- Used for providing the DML of the file which can be used to map data in the file