How to copy file based on date in Azure Data Factory - azure-data-factory

I have a list of files in a adls container which contain date in the name as given below:
TestFile-Name-20221120. csv
TestFile-Name-20221119. csv
TestFile-Name-20221118. csv
and i want to copy files which contain today date only like TestFile-Name-20221120. csv on today and so on.
I've used get metedata activity to get list of files and then for each to iterate over each file and then used set variable to extract name from the file like 20221120 but not sure how to proceed further.

We have something similar running. We check an SFTP folder for the existanc e of files, using the Get Metadata activity. In our case, there can be folders or files. We only want to process files, and very specific ones for that matter (I.e. we have 1 pipeline per filename we can process, as the different filenames would contain different columns/datatypes etc).
Our pipeline looks like this:
Within our Get Metadata component, we basically just filter for the name of the object we want, and we only want files ending in .zip, meaning we added a Filename filter:
:
In your case, the first part would be 'TestFile-Name-', and the second part would be *.csv'.
We then have a For Each loop set up, to process anything (the child items) we retrieved in the Get Metadata step. Within the For Each we defined an If Condition to only process files, and not folders.
In our cases, we use the following expression:
#equals(item().type, 'File')
In your case, you could use something like:
#endsWith(item().name, concat(<variable containing your date>, '.csv'))

Assuming all the file names start with TestFile-Name-,
and you want to copy the data of file with todays date,
use get metadata activity to check if the file exists and the file name can be dynamic like
#concat('TestFile-Name-',utcnow(),'.csv')
Note: you need to fromat utcnow as per the needed format
and if file exists, then proceed for copy else ignore

Related

How to rename an XML file using the pattern in a different .txt file?

I have a folder say, source folder, containing 1000+ xml files with some ambiguous names, like:
_MIM_15646432635_6664684
_MIM_54154548557_6568436 etc.
Out of these thousands of XML files I’ve to select some 10-12 xml files with a particular node in them and move them to another folder (destination folder) and rename the files respectively with some meaningful names.
For example:
There is an xml file with name _MIM_15646432635_6664684 and it contains a node pattern like: “bab6e7h835468eg” and I’ve to rename it to name like: {1FE9909E-4450-B98665362022}
So for this I’ve to write a script which will search the file in source folder and if it finds my desired node pattern then move this file to destination folder post renaming it to some meaningful name.
Provided I’ve an excel sheet where I do have a list of two columns, one containing the specific node pattern and column two has a respective new name list.
Currenty, I’ve a script which can search a file and move it to another folder provided I’m giving input to the script with the node pattern and file name from that excel sheet:
Select-String – Path “\Dubwta01\AIR\Invalid*.xml” – Pattern ‘bab6e7h835468eg’ | %{Copy-Item – Path $_.Path – Destination ‘\Dubwta01\AIR\Invalid{1FE9909E-4450-B98665362022}.xml’
What now I need is that a new script which will pick all 10 files from source folder and move it to destination folder by renaming it and I don’t have to hard code the pattern and new name in script rather it shall fetch the details from that excel or I can save the details in text files whatever is suitable for script to pick the name and pattern from.

How to copy the files from SFTP to target folder created dynamically based on source filename (Blob storage)?

I am new to ADF, need help for 2 scenarios
1.I have to copy files from SFTP to blob storage(Azure Gnen2) using ADF. In the source SFTP folder, there are 3- 5 different the files. For example
S09353.DB2K.AFC00R46.F201130.txt
S09353.DB2K.XYZ00R46.F201130.txt
S09353.DB2K.GLY00R46.F201130.txt
On copying, this files are copied and placed under corresponding folders which are created dynamically based on file types.
For example: S09353.DB2K.AFC00R46.F201130.txt copy to AFC00R46 folder
S09353.DB2K.XYZ00R46.F201130.txt copy to XYZ00R46 folder.
2.Another requirement is need to copy csv files from blob storage to SFTP. On coping, the files need to copy to target folder created dynamically based on file name:
for example: cust-fin.csv----->copy to--------->Finance folder
please help me on this
The basic solution to your problem is to use Parameters in DataSets. This example is for a Blob Storage connection, but the approach is the same for SFTP as well. Another hint: if you are just moving files, use Binary DataSets.
Create Parameter(s) in DataSet
Reference Parameter(s) in DataSet
Supply Parameter(s) in the Pipeline
In this example, I am passing Pipeline parameters to a GetMetadata Activity, but the principles are the same for all DataSet types. The values could also be hard coded, expressions, or variables.
Your Process
If you need this to be dynamic for each file name, then you'll probably want to break this into parts:
Use a GetMetadata Activity to list the files from SFTP.
Foreach over the return list and process each file individually.
Inside the Foreach -> Parse each file name individually to extract the Folder name to a variable.
Inside the Foreach -> Use the Variable in a Copy Activity to populate the Folder name in the DataSet.

Powershell Only:File name change according to mapping file and then copy to another directory

I have a mapping file in import.CSV as following:
Name|Business
Jack|Carpenter
Rose|Secretary
William|Clerk
Now, I have a directory which contains files like
90986883#Jack#Sal#1000.dat
76889992#Rose#Sal#2900.dat
67899279#William#Sal#1900.dat
12793298#Harry#Sal#2500.dat
Please note #Sal will always be there after Name. I need to pick these files and put into another directory and end result in second directory should look like.
90986883#Carpenter#Sal#1000.dat
76889992#Secretary#Sal#2900.dat
67899279#Clerk#Sal#1900.dat
Basically Files need to be renamed based upon CSV file and if Name is not there in file name , then there is no action required. Please note that source file should not be changed.
I will appreciate all kind of help.

Talend - Extract FileName from tLogRow/tSort

I am new to Talend and just trying to work my way through it.
Problem Statement
I need to process a positional file, from a list of files. Need to identify the latest file first and then process only that file. I was able to identify the most updated file. And then I was able to create another flow which processes the positional file. The problem is combining these two flows so that I am able to identify the most recent file and have just that one processed.
Tried so far
Have been trying to extract the most recent file from a list within a directory. Iterated through all the files, retained their properties in a buffer. Post completion of this sub-task, read through the buffer, sorted with descending mime, extracted the top record and was able to print it using tLogRow.
All seems to be fine except I don't know how to use the filename now for next task.
I am certain this is very rudimentary but I'll be honest, I've been scourging the internet/help from quite some time now, with no success.
Any pointers would help.
The job flow is attached for your reference.
First of all, you can simplify your job by using tFileList's capabilities. It can sort files by their modified date:
Next, use tIterateToFlow to convert each iteration to a row:
(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")
and tSampleRow with a range of "1", to get the most recent file.
Then store the result in a global variable. In the next subjob, just use that global variable as your filename in tFileInputPositional.

Rename specific a component of file names using a list

I have a number of files that look like this:
imgDATA_subj001_log000_sess001_at.img
imgDATA_subj001_log000_sess001_cn.img
imgDATA_subj001_log000_sess001_cx.img
imgDATA_subj001_log000_sess002_at.img
imgDATA_subj001_log000_sess002_cn.img
imgDATA_subj001_log000_sess002_cx.img
imgDATA_subj002_log000_sess001_at.img
...
I want to rename a specific numeric part of the file name after subj . For instance, subj001, subj002, subj003, etc. would be renamed to subj014, subj027,subj65, etc. but preserve the rest of the file name. I have the list of new names but not sure how to look for old names and match with the new names then do the renaming. I tried using loops and fileparts but I don't know how to isolate the subj*** component. I could do move but that would be very inefficient. Can anyone help?
If you know the portions of the filename that you want to replace specifically, ie you know that subj001 = subj014 then what you should do is use a dir command to get the list of files in the existing directory.
This will give you a list of the files,
f=dir(imgData*.img)
for somecounter=1:length(f)
filename=f(somecounter).name
newname=strrep(filename,'subj001','subj014')
movefile(filename,newname)
end
Obviously you'll want to setup an array of each of the individual names to match up and iterate through that.