Requiremet of Error logging mail(mail should contain all the missing file details). in the ADF flow - azure-data-factory

the reqirement is simple , i have a folder having 4 txt files(1.txt,2.txt,3.txt,4.txt) . the Flow is controlled by a parameter called all or some which is of string type.
If i select all in the parameter, all 4 file should be processed. the requirement starts here >>
IF any file is missing from the folder(for ex 2.txt and 3.txt is not present and i selected ALL in the parameter) , i need a mail saying file is 2.txt and 3.txt is missing.
If i select some in the parameter, for ex 1.txt and 4.txt and if any of the file is missing 1.txt and 4.txt is missing(for example 1.txt is missing) , i need a mail with the missing file name(i.e 1.txt in our case).

capture missing file details in one variable
I tried to repro this capturing missing files using azure data factory. Below is the approach.
Take a parameter of array type in the pipeline. During runtime, you can give the list of file names in a folder to be processed in this array parameter.
Take a get metadata activity and add a dataset in the activity. Click +New in the field list and select child items as an argument.
Take a filter activity and give the array parameter value in items and write the condition to filter the missing files in condition box
item:
#pipeline().parameters.AllorSome
condition:
#not(contains(string(activity('Get Metadata1').output.childItems),item()))
I tried to run this pipeline. During run time, four file names are given to the array parameter.
Get Metadata activity output has three file names.
Parameter has 4 filenames and Get meta data activity has 3 filenames. Missing file names are to be filtered out.
Output of filter activity:
Use this output and send it in email.
Refer the MS document on How to send email - Azure Data Factory & Azure Synapse | Microsoft Learn for sending email.

Related

How to remove last 2 positions in split and get remaining first value.?

Let's say string is a variable file name like few examples below:
file1_name_cr_001.csv
file2_name1_name2.nn.123.456_updt_000.csv
filename_2012.444.1234_utc_del_004.csv
The length of last 8 string values will always remain fixed i.e. (_001.csv,_000.csv,_004.csv). We need to only extract values = cr, updt, del
How can we get the value as single value before _cr,_updt,_del.?
any suggetions.?
output should get like this:
file1_name/cr/001
file2_name1_name2.nn.123.456/updt/000
filename_2012.444.1234_utc/del/004
I have reproduced the above and got the below results.
First, I took a sample file name in set variable.
Then, I got the string from start to length-8.
#substring(variables('sample'),0,sub(length(variables('sample')),8))
For end folder:
#replace(split(substring(variables('sample'),sub(length(variables('sample')),8), 8),'.')[0],'_','')
For Start folder:
#substring(variables('before_8'), 0, lastIndexOf(variables('before_8'), '_'))
For middle folder:
#split(variables('before_8'), '_')[sub(length(split(variables('before_8'), '_')), 1)]
Result folder structure:
#concat(variables('start'),'/',variables('middle'),'/',variables('end'))
Result:
Give this variable in copy activity source folder path and it will generate the folder structure for you.
For multiple file names, first store all file names in an array then use a ForEach and inside ForEach do the same operations as above.

Copy Each '.txt' File into respective date folder Based on Date in Filename using data factory

``
I have to copy files from source folder to target folder both are in the same storage account(ADL). The files in the source folder are of in .txt format and have date appended in the file name,
eg: RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
and
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(20221201 and 20221202 is date in file name , date format: yyyymmdd)
I have to create a pipeline that will sort and store files in the folders in ADL's in this hierarchy
ex: adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
even if we have multiple files on same date in file name based on that date in file name it has to create year(YYYY) folder and in year(YYYY) folder it should create month(MM) folder and in month(MM) folder it should create date(DD) folder like above example. Each File should copy into respective yyyy and respective mm and respective date folder.
What I have done:
In Get Metadata - Given argument to extract **childitems**
For each activity that contains a Copy activity.
In Copy activity source wildcard path is given as *.txt
for sink took concat expression using split and substring functions
Please check the screenshots of all activities and expressions
but this pipeline is creating the folders based on date in file name (like adl/2022/12/01)
but problem is it was copying all files into all date(DD) folders
(like adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt)
1.[GET META to extract child items](https://i.stack.imgur.com/GVYgZ.png)
2.[Giving GET META output to FOREACH](https://i.stack.imgur.com/cbo30.png)
3.[Inside FOREACH using COPY ](https://i.stack.imgur.com/U5LK5.png)
4.[Source Data Set](https://i.stack.imgur.com/hyzuC.png)
5.[Sink Data Set](https://i.stack.imgur.com/aiYYm.png) Expression used in Data Set in Folder Path '#concat('adl','/'dataset().FolderName)
6.[Took parameter for Sink](https://i.stack.imgur.com/QihZR.png)
7.[Sink in copy activity ](https://i.stack.imgur.com/4OzT5.png)
Expression used in sink for dynamic folders using split and substring function
#concat(substring(split(item().name,'.')[3],0,4),'/',
substring(split(item().name,'.')[3],4,2),'/',
substring(split(item().name,'.')[3],6,2)
)
**OUTPUT for this pipeline**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
**Required Output is**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(i.e each file should copy to respective date folders only even if we have multiple files in same date, they should copy to date folders based on date in file name)
I have reproduced the above and got same result when I followed the steps that you have given.
Copy activity did like this because, in source or sink you did not gave #item().name(file name for that particular iteration) and you have given *.txt in the wildcard path of source in copy activity.
It means for every iteration(for every file name) it copies all .txt files from source into that particular target folder(same happened for you).
To avoid this,
Give #item().name in source wild card file name
It means we are giving only one that iteration file name in the source for the copy.
(OR)
Keep the wildcard file name in source as it is(*.txt) and create a sink dataset parameter for file name.
and give #item().name to it in copy activity sink.
You can do any of the above and if you want you can do both at a time. I have checked all the 3 scenarios like
1.#item().name in wild card sink file name.
2. #item().name in dataset file name by keeping wildcard path same.
3. combining both 1 and 2(#item().name in wild card file name and in sink dataset parameter).
All are working fine and giving desired result.

saving multiple fields of structure as separate mat files and creating directory non-existing directories

There are two parts of my query
1) How to save different fields of structures as separate files(each file containing only named field of structure )?
2) Forcing save command to create directories in the save path when intermediate directories do not exist?
For first part:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields=fieldnames(data);
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct','data');
end
I want to save each field of struct data as a separate .mat file. So that after executing the loop, I should have 3 files inside E:\data viz. a.mat,b.mat and c.mat and a.mat contains only data of field 'a', b.mat contains only data of field 'b' and so on.
When I exeucte the above code, I get three files in my directory but each file contains identical content of all three variables a, b and c, instead of individual variables in each file.
Following command does not work:
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct',['data.' fields{i} ]);
end
Error using save
The argument to -STRUCT must be the name of a scalar structure variable.
Is there some way to use save command to achieve my purpose without having to create temporary vaiables for saving each field?
For Second Part:
I have large number of files which need to stored in a directory structure. I want following to work.
test='abcdefgh';
save(['E:\data\' test(1:2) '\' test(3:4) '\' test(5:6) '\result.mat'])
But it showing following error
Error using save
Cannot create 'result.mat' because 'E:\data\ab\cd\ef' does not exist.
If any intermediate directory are not present, then they should be created by save command. I can get this part to work by checking if directory is present or not using exist command and then create directory using mkdir. I am wondering if there is some way to force save command to do the work using some argument I am not aware of.
Your field input argument to save is wrong. Per the documentation, the format is:
'-struct',structName,field1,...,fieldN
So the appropriate save syntax is:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields = fieldnames(data);
for ii = 1:length(fields)
save(['E:\data\' fields{ii} '.mat'], '-struct', 'data', fields{ii});
end
And no, you cannot force save to generate the intermediate directories. Check for the existence of the save path first and create it if necessary.

SSIS - Use data from Flat File Source in Subject Line of Send Mail task

We import files from an FTP. The file could be named anything.csv.
Every 15 mins, we use a Script Task to pull the directory listing of an FTP folder; then in a ForEach Loop Container (using Variable Enumerator) we download each FTP file, process it, delete it from the FTP and send a generic "Complete" email to confirm a file has been processed.
Is there a way we can use one of the columns ([ClientName]) in the anything.csv data file and put in the Subject line of the email we send out. So it reads "[ClientName] order has been imported"
I've tried using an Object variable (via a Recordset Destination and it's VariableName), but I get the error "the datatype of variable "User::Subject" is not supported in an expression".
Thanks
Chris
If you only want one client name per .csv file, you need to populate a string-type variable with just one clientname from the file. Then you will be able to use that string variable to build the subject line expression.

Using context variable with fix values

In my talend job I have a context variable named context.TempFolder.
Now while copying data from sql table to excel file I need to create an Excel file named export.excel (fixed name) in to the folder specified by the variable context.TempFolder.
How do I specify the 'File Name' of my tFileOutputExcel component?
Here value of a context variable TempFolder might change but I will always be creating Excel file by same name export.excel
You just need to concatenate the context.TempFolder with your output file name.
So your file path for your tFileOutputExcel should look something like:
context.TempFolder + "export.excel.xls"
You can use vraiables and strings like this in a lot of places in Talend. To do something slightly more complicated, you might define the output file name in your job (so calculate it at run time) and then put that file name in the globalMap and then retrieve it when you output your file so you might end up with something like:
context.OutputFolder + (String)globalMap.get("FileName") + ".xls"
This is useful for date-time stamping files for example. Or maybe defining the file name by some sort of data in your input.