In Talend how to fetch latest file - talend

I have three files which are having the same schema,
A1(file) received at 12:30:000.00,
A2(file) received at 12:35:000.00,
A3(file) received at 12:40:000.00.
Now I want to fetch the latest file which is A3.
Note: I have used to tfilelist component to fetch the file.

Talend Docs for tFileList:
Order by:
By modified date: most recent to least recent or least recent to most recent.
The Talend Knowledge Base has a load of information about components. Also, the components speak mostly for themselves if you examine them a bit.

tFileList --> tFileProperties --> tJavaRow
tFileList to iterate over the file list
tFileProperties to get files properties
tJavaRow to save the filepath (using a global variable) for the file with the greatest value for mtime field
After that, tFileInputDelimited using the global variable for filename

You can create a job with these components:
tFileList -> tFileProperties -> tAggregateRow -> tLogRow (or any output component)
In tFileList provide the Directory Path.
tFileProperties contains schema corresponding to the properties of the file like basename, Modified time, Absolute Path etc.
In tFileProperties pass the global variable for the filepath ie ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")).
In tAggregaterow under Operations section select the columns to be displayed & use Max function for mtime_string column.

Related

Dynamically Add a Timestamp To Files in Azure Data Factory

I am new to ADF, i want to copy an excel from source to Achieve folder with added timestamp to the file, I tried following set up as parameters for source and target and run copy job. its just copying the file to the target not with timestamp. Not sure what to be done to fix this one right
following is the target filename value
#concat(replace(pipeline().parameters.pTriggerFile,'.csv',''), '_', formatDateTime(convertTimeZone(utcnow(),'UTC','Eastern Standard Time'),'yyyy-MM-ddTHHmmss'), '.csv')
Source Dataset
Target dataset
Follow the below steps to add a timestamp to the source filename when copying it to sink.
Source:
Azure data factory copy activity:
In the source dataset, create a parameter for the source filename and pass it dynamically in the file path.
In Source, create a parameter at the pipeline level and pass the filename dynamically to the dataset parameter.
In the sink dataset, create a dataset parameter and add it dynamically to the sink file path.
In the sink, pass the below dynamic content to add the current timestamp to the filename.
#concat(replace(pipeline().parameters.sourcefilename,'.csv',''), '_', formatDateTime(convertTimeZone(utcnow(),'UTC','Eastern Standard Time'),'yyyy-MM-ddTHHmmss'), '.csv')
When you run the pipeline, you can see the sink file has the timestamp added to it.

How to keep unstructured file name as value and insert into database

I'm new in using IBM Data Stage, i need to keep the file name that i set in the unstructured file in filepath as a value. Then i need to insert that value in original_file column of my table for all rows automatically. Is there any way to do this?
Assuming the file name is a job parameter and will be provided each job run you could use a Transformer - add a new column "original_file" and use the parameter name as derivation.
Note: A parameter is provided i.e. file_name will be referenced in DataStage with #file_name# (i.e. in the file stage) but will be referenced in the Transformer as file_name (without the #s)

tExtractRegexField unable to act as lookup to tMap in Talend DI

I have a tExtractRegexField which extracts a date from a string of text coming from a ExcelFileInput and will output the dates to tLogRow but I can't connect the same output as a lookup column to a tMap with a 2nd ExcelFileInput as its main input.
If I connect the ExtractRegexField to tMap first I can't then connect the 2nd ExcelFileInput and vis versa
I'm using Talend 6.3.1 and for testing I am able to connect 2 x ExcelFileInput to a tMap so I dont think its a problem with my system setup.
I have also tried tJoin instead of tMap but I encounter the same issue (can't connect both inputs together but can connect "A" or "B" first)
Overview of Process
Problem Area
The tExcelFileInput uses globalMap to get the path to the excel file from the preceding tFlowToIterate
Based on discussions on the talend forum the issue may have been down to a desire by taland DI to avoid circular references
An alternative solution is to extract the regexfield from the header row and store them in a global variable using a tJavaRow and globalMap.put("MyVal", row.Data); and then OnComponentOk read the remaining data from the body rows and in the tMap recall the global variable MyVal and include it as needed in your tMap Output

How can tFileList output be sent to a file in Talend?

I want to connect the list of files returned by tFileList to a file. Does not allow me to connect it to a tFileOutputDelimited component.
what you can do is below - tfilelist gives option to iterate and you can connect it to tFixedFlowInput. in tFixedFlowInput you can create a schema column say filename with expression as
filename = ((String)globalMap.get("tFileList_1_CURRENT_FILE"))
assuming tFileList_1 is the name of tFileList component.
From tFixedFlowInput you can connect to tFileOutputDelimited and write to file. Make sure to use append option of tFileOutputDelimited as else it will keep overwriting data in each iteration.
tFileList--->(iterate)tFixedFlowInput------>(rowmain)------->tFileOutputDelimited

Talend Data Integration - Read Multiple Files from a folder and load the data in mysql database

I have 5 files in a folder.file names are stored in date format like "2015-09-10.txt" to "2015-09-15.txt".
if I give starting file name as 2015-09-11.txt and end file is 2015-09-13.txt then it will read all the files present in between these two files(i.e read 11,12 and 13 date files).and load data into database. the other files will not insert in database.
my current Talend Package is :
tFileList -> tFileInputDelimited -> tMapProcessing -> tMysqlOutput.
You can use this file mask in the tFileList:
"2015-09-1[1-3]"
if you have something more complicated, generate the file name using tJavaFlex and iterate over file names:
tJavaFlex ------(iterate)------tFileInputDelimited-------(main)------tMap-------(main)--- tMysqlOutput