How to parameterise Dataset definition filename in Azure Data factory - azure-data-factory

I have a blob storage container folder (source) that gets several csv files. My task is to pick the csv files starting with "file". See example filename below::
file12345.csv
The numeric part varies every time.
I have set the "fixed" Container and Directory names in the image below but it seems the File parameter does not accept wildcard "File*.csv".
How can I pass a wildcard to the Dataset definition?
Thanks

You can't do that operation in Soure dataset.
Just choose the container or folder in the dataset like bellow:
Choose the Wildcard file path in Source settings:
The will help you filter the filename wildcard "File*.csv".
Ref: Copy activity properties:
Hope this helps.

Related

How to copy file based on date in Azure Data Factory

I have a list of files in a adls container which contain date in the name as given below:
TestFile-Name-20221120. csv
TestFile-Name-20221119. csv
TestFile-Name-20221118. csv
and i want to copy files which contain today date only like TestFile-Name-20221120. csv on today and so on.
I've used get metedata activity to get list of files and then for each to iterate over each file and then used set variable to extract name from the file like 20221120 but not sure how to proceed further.
We have something similar running. We check an SFTP folder for the existanc e of files, using the Get Metadata activity. In our case, there can be folders or files. We only want to process files, and very specific ones for that matter (I.e. we have 1 pipeline per filename we can process, as the different filenames would contain different columns/datatypes etc).
Our pipeline looks like this:
Within our Get Metadata component, we basically just filter for the name of the object we want, and we only want files ending in .zip, meaning we added a Filename filter:
:
In your case, the first part would be 'TestFile-Name-', and the second part would be *.csv'.
We then have a For Each loop set up, to process anything (the child items) we retrieved in the Get Metadata step. Within the For Each we defined an If Condition to only process files, and not folders.
In our cases, we use the following expression:
#equals(item().type, 'File')
In your case, you could use something like:
#endsWith(item().name, concat(<variable containing your date>, '.csv'))
Assuming all the file names start with TestFile-Name-,
and you want to copy the data of file with todays date,
use get metadata activity to check if the file exists and the file name can be dynamic like
#concat('TestFile-Name-',utcnow(),'.csv')
Note: you need to fromat utcnow as per the needed format
and if file exists, then proceed for copy else ignore

Configuring sink data set in azure data factory

I am trying to copy multiple folders with their files (.dat and .csv ) from ftp to Azure storage account , so I am using a get metadata for each and copy activity. My problem is that when setting the file path in the output data set I am not sure how to set the filename so it picks up all files in my folder.
I added a filename parameter in the data set and in the copydata sink I set it as #item().name but it's not working instead of copying the files, it copies the folder. the second try is that I dont set the filename in the directory, and it does copy the files but it adds the extension.txt to the files instead of keeping their original format.
Thank you for your help,
enter image description here
enter image description here
You need to add a third parameter for the sink dataset for filename.
Here you can pass parameter as you have for container and folder.
Filename can be set from the Get Metadata activity output.

How to copy the files from SFTP to target folder created dynamically based on source filename (Blob storage)?

I am new to ADF, need help for 2 scenarios
1.I have to copy files from SFTP to blob storage(Azure Gnen2) using ADF. In the source SFTP folder, there are 3- 5 different the files. For example
S09353.DB2K.AFC00R46.F201130.txt
S09353.DB2K.XYZ00R46.F201130.txt
S09353.DB2K.GLY00R46.F201130.txt
On copying, this files are copied and placed under corresponding folders which are created dynamically based on file types.
For example: S09353.DB2K.AFC00R46.F201130.txt copy to AFC00R46 folder
S09353.DB2K.XYZ00R46.F201130.txt copy to XYZ00R46 folder.
2.Another requirement is need to copy csv files from blob storage to SFTP. On coping, the files need to copy to target folder created dynamically based on file name:
for example: cust-fin.csv----->copy to--------->Finance folder
please help me on this
The basic solution to your problem is to use Parameters in DataSets. This example is for a Blob Storage connection, but the approach is the same for SFTP as well. Another hint: if you are just moving files, use Binary DataSets.
Create Parameter(s) in DataSet
Reference Parameter(s) in DataSet
Supply Parameter(s) in the Pipeline
In this example, I am passing Pipeline parameters to a GetMetadata Activity, but the principles are the same for all DataSet types. The values could also be hard coded, expressions, or variables.
Your Process
If you need this to be dynamic for each file name, then you'll probably want to break this into parts:
Use a GetMetadata Activity to list the files from SFTP.
Foreach over the return list and process each file individually.
Inside the Foreach -> Parse each file name individually to extract the Folder name to a variable.
Inside the Foreach -> Use the Variable in a Copy Activity to populate the Folder name in the DataSet.

How does regex_path_filter work in GCSFile properties of DATA FUSION pipeline in GCP

IN Data fusion pipeline of GCP, the GCSFile properties having a field named "Regex path filter". How does it work?. I don't get proper documentation on this.
You can find the regex documentation here.
How does it work? It is applied to the filenames and not to the whole path.
For example, lets say you have the following path: gs://<my-bucket>/<my/complete/path>/ and you have some CSV and JSON files inside this path.
To filter only the CSV files you would use the regex .*\.csv
Please note that this regex will only filter what starts after your path.

How to use wildcards in filename in AzureDataFactoryV2 to copy only specific files from a container?

So I have a pipeline in AzureDataFactoryV2, in that pipeline I have defined a copyActivity to copy files from blob storage to Azure DataLake Store. But I want to copy all the files except the files that have "-2192-" string in them.
So If I have these files:
213-1000-aasd.csv
343-2000-aasd.csv
343-3000-aasd.csv
213-2192-aasd.csv
I want to copy all using copyactivity but not 213-2192-aasd.csv. I have tried using different regular expression in wildcard option but no success.
According to my knowledge regular expression should be:
[^-2192-].csv
But it gives errors on this.
Thanks.
I don't know whether the data factory expression language supports Regex. Assuming it does not, the Wildcard is probably positive matching only, so Wildcard to exclude specific patterns seems unlikely.
What you could do is use 1) Get Metadata to get the list of objects in the blob folder, then 2) a Filter where the item().type is 'File' and index of '-2192-' in the file name is < 0 [the indexes are 0-based], and finally 3) a ForEach over the Filter that contains the Copy activity.