Using context variable with fix values - talend

In my talend job I have a context variable named context.TempFolder.
Now while copying data from sql table to excel file I need to create an Excel file named export.excel (fixed name) in to the folder specified by the variable context.TempFolder.
How do I specify the 'File Name' of my tFileOutputExcel component?
Here value of a context variable TempFolder might change but I will always be creating Excel file by same name export.excel

You just need to concatenate the context.TempFolder with your output file name.
So your file path for your tFileOutputExcel should look something like:
context.TempFolder + "export.excel.xls"
You can use vraiables and strings like this in a lot of places in Talend. To do something slightly more complicated, you might define the output file name in your job (so calculate it at run time) and then put that file name in the globalMap and then retrieve it when you output your file so you might end up with something like:
context.OutputFolder + (String)globalMap.get("FileName") + ".xls"
This is useful for date-time stamping files for example. Or maybe defining the file name by some sort of data in your input.

Related

Copy Each '.txt' File into respective date folder Based on Date in Filename using data factory

``
I have to copy files from source folder to target folder both are in the same storage account(ADL). The files in the source folder are of in .txt format and have date appended in the file name,
eg: RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
and
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(20221201 and 20221202 is date in file name , date format: yyyymmdd)
I have to create a pipeline that will sort and store files in the folders in ADL's in this hierarchy
ex: adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
even if we have multiple files on same date in file name based on that date in file name it has to create year(YYYY) folder and in year(YYYY) folder it should create month(MM) folder and in month(MM) folder it should create date(DD) folder like above example. Each File should copy into respective yyyy and respective mm and respective date folder.
What I have done:
In Get Metadata - Given argument to extract **childitems**
For each activity that contains a Copy activity.
In Copy activity source wildcard path is given as *.txt
for sink took concat expression using split and substring functions
Please check the screenshots of all activities and expressions
but this pipeline is creating the folders based on date in file name (like adl/2022/12/01)
but problem is it was copying all files into all date(DD) folders
(like adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt)
1.[GET META to extract child items](https://i.stack.imgur.com/GVYgZ.png)
2.[Giving GET META output to FOREACH](https://i.stack.imgur.com/cbo30.png)
3.[Inside FOREACH using COPY ](https://i.stack.imgur.com/U5LK5.png)
4.[Source Data Set](https://i.stack.imgur.com/hyzuC.png)
5.[Sink Data Set](https://i.stack.imgur.com/aiYYm.png) Expression used in Data Set in Folder Path '#concat('adl','/'dataset().FolderName)
6.[Took parameter for Sink](https://i.stack.imgur.com/QihZR.png)
7.[Sink in copy activity ](https://i.stack.imgur.com/4OzT5.png)
Expression used in sink for dynamic folders using split and substring function
#concat(substring(split(item().name,'.')[3],0,4),'/',
substring(split(item().name,'.')[3],4,2),'/',
substring(split(item().name,'.')[3],6,2)
)
**OUTPUT for this pipeline**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
**Required Output is**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(i.e each file should copy to respective date folders only even if we have multiple files in same date, they should copy to date folders based on date in file name)
I have reproduced the above and got same result when I followed the steps that you have given.
Copy activity did like this because, in source or sink you did not gave #item().name(file name for that particular iteration) and you have given *.txt in the wildcard path of source in copy activity.
It means for every iteration(for every file name) it copies all .txt files from source into that particular target folder(same happened for you).
To avoid this,
Give #item().name in source wild card file name
It means we are giving only one that iteration file name in the source for the copy.
(OR)
Keep the wildcard file name in source as it is(*.txt) and create a sink dataset parameter for file name.
and give #item().name to it in copy activity sink.
You can do any of the above and if you want you can do both at a time. I have checked all the 3 scenarios like
1.#item().name in wild card sink file name.
2. #item().name in dataset file name by keeping wildcard path same.
3. combining both 1 and 2(#item().name in wild card file name and in sink dataset parameter).
All are working fine and giving desired result.

saving multiple fields of structure as separate mat files and creating directory non-existing directories

There are two parts of my query
1) How to save different fields of structures as separate files(each file containing only named field of structure )?
2) Forcing save command to create directories in the save path when intermediate directories do not exist?
For first part:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields=fieldnames(data);
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct','data');
end
I want to save each field of struct data as a separate .mat file. So that after executing the loop, I should have 3 files inside E:\data viz. a.mat,b.mat and c.mat and a.mat contains only data of field 'a', b.mat contains only data of field 'b' and so on.
When I exeucte the above code, I get three files in my directory but each file contains identical content of all three variables a, b and c, instead of individual variables in each file.
Following command does not work:
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct',['data.' fields{i} ]);
end
Error using save
The argument to -STRUCT must be the name of a scalar structure variable.
Is there some way to use save command to achieve my purpose without having to create temporary vaiables for saving each field?
For Second Part:
I have large number of files which need to stored in a directory structure. I want following to work.
test='abcdefgh';
save(['E:\data\' test(1:2) '\' test(3:4) '\' test(5:6) '\result.mat'])
But it showing following error
Error using save
Cannot create 'result.mat' because 'E:\data\ab\cd\ef' does not exist.
If any intermediate directory are not present, then they should be created by save command. I can get this part to work by checking if directory is present or not using exist command and then create directory using mkdir. I am wondering if there is some way to force save command to do the work using some argument I am not aware of.
Your field input argument to save is wrong. Per the documentation, the format is:
'-struct',structName,field1,...,fieldN
So the appropriate save syntax is:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields = fieldnames(data);
for ii = 1:length(fields)
save(['E:\data\' fields{ii} '.mat'], '-struct', 'data', fields{ii});
end
And no, you cannot force save to generate the intermediate directories. Check for the existence of the save path first and create it if necessary.

Copy Dynamically generated filename and paste it in an other Folder

I am held up with one of the task I need to perform to create CSV file with its name generated in run time and then Copy that same file and paste it to a different folder. I'm able to create the required file.
Here is what I've done till now:
In SSIS I'm taking a DFT in control flow and taking a view as my
OLEDB source, then pointing it to a Flat File destination and
creating a file in my desired location say folder x in a variable
i.e My_dest_folder for the variable I've created. Here are the steps I've followed.
My_dest_folder of type string and have given my folders path as the value.
Filename of type sting and gave a name say cv99351_ as the value.
Timestamp of type string and give the expression which generates a timestamp YYYYMMDDHHMISS format.
Archivefolder of type sting and gave another path where the generated file is supposed to be copied from My_dest_folder &
pasted into
archive folder.
In the connection string of my flat file connection manager, I have given the variables with
#My_dest_folder+#Filename+#Timestamp+".csv". which creates a file
with name cs99351_.csv in the folder x.
After the file is created I am trying to capture the filename from the My_dest_folder but since the timestamp also contains seconds I am not able to capture it everytime.
Can someone please help me out here? I would really appreciate it.
If someone want to save his files with SSIS, your description is already nice and could be use as a tutorial :)
But if I understand well you have a problem at the end of your process, when you try to get the filename generated.
To read it you use the same variable concatenation but sometimes your Timestamp can change and then you get an error (Your file doesn't exist)
If yes, I guess that you use a kind of GETDATE() function in the expression of your variable. It appears that SSIS will evalute the value of your variable each time you will request it.
I tested it:
I Ran 3 Insert statement and wait with the debugger between each.
It gave me 3 differents values:
I recommend you to not use your getdate() function in the variable expression.
You can retrieve it with a unique SQL Task (with a SELECT GETDATE() SQL Query) or with C# / VB method.
Does it solve your problem?
Regards,
Arnaud
I had a similar issue.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[$Project::v_FilePath] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_FileName]
The file was moved into the archive folder. To get the source file name , I was using the exactly same variable name as the destination variable name in step 2.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_ArchivedFileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_FileName]
Destination Variable
#[User::v_Archive_ArchivedFileName]
If the timestamp for step 2 and step 3 has even a second difference, there is an error as, pointed out above, that GETDATE() will evaluate each time when it is requested.
So the solution I came up with was swapping the Step 2 and Step 3.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
The file was moved into the archive folder.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_OrigFileName] : #[User::v_Archive_Folder] +"\\"+ #[$Project::v_FileName]
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_OrigFileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_OrigFileName]
Destination Variable
#[User::v_Archive_FileName]
Hope this gives an idea to have a different spin on this issue.

MATLAB: Save figure with default name

I am running a matlab-script that produces a figure. To save this figure I use:
print(h_f,'-dpng','-r600','filename.png')
What this means is that if I don't change filename for each time I run the script, the figure filename.png will be overwritten.
Is there a way to save a figure to a default name, e.g. untitled.png, and then when the script is run twice it will make a new figure untitled(1).png instead of overwriting the original one?
You could create a new filename based on the number of existing files
defaultName = 'untitled';
fileName = sprintf('%s_%d.png', defaultName, ...
length(dir([defaultName '_*.png'])));
print(h_f,'-dpng','-r600', fileName)
Add a folder path to your dir search path if the files aren't located in your current working directory.
This will create a 0-index file name list
untitled_0.png
untitled_1.png
untitled_2.png
untitled_3.png
...
You could also use tempname to generate a long random name for each iteration. Unique for most cases, see section Limitations.
print(h_f,'-dpng','-r600', [tempname(pwd) '.png'])
The input argument (pwd in the example) is needed if you do not want to save the files in your TEMPDIR
You can try something like this:
for jj=1:N
name_image=sscanf('filename','%s') ;
ext=sscanf('.png','%s') ;
%%do your stuff
filename=strcat(name_image,num2str(jj),ext);
print(h_f,'-dpng','-r600',filename)
end
If you want to execute your script multiple time (because you don't want to use a "for") just declare a variable (for example jjthat will be incremented at the end of the script:
jj=jj+1;
Be careful to don't delete this variable and, when you start again your script, you will use the next value of jj to compose the name of the new image.
This is just an idea

Store user input as wildcard

I am having some trouble with a data processing function in MATLAB. The function takes the name of the file to be processed as an input, finds the desired files, and reads in the data.
However, several of the desired files are variants, such as Data_00.dat, Data.dat, or Data_1_March.dat. Within my function, I would like to search for all files containing Data and condense them into one usable file for processing.
To solve this, I would like desiredfile to be converted into a wildcard.
Here is the statement I would like to use.
selectedfiles = dir *desiredfile*.dat % Search for file names containing desiredfile
This returns all files containing the variable name desiredfile, rather than the user input.
The only solution that I can think of is writing a separate function that manually condenses all the variants into one file before my function is run, but I am trying to keep the number of files used down and would like to avoid this.
You could concatenate strings for that. Considering desiredFile as a variable.
desiredFile = input('Files: ');
selectedfiles = dir(['*' desiredfile '*.dat']) % Search for file names containing desiredfile
Enclosing strings between square brackets [string1 string2 ... stringN]concatenates them. Matlab's dir function receives a string.
I believe you can achieve that using the dir command.
dataSets = dir('/path/to/dir/containing/Data*.dat');
dataSets = {dataSets.name};
Now simply loop over them, more information here.
To quote the matlab help:
dir lists the files and folders in the MATLABĀ® current folder. Results appear in the order returned by the operating system.
dir name lists the files and folders that match the string name. When name is a folder, dir lists the contents of the folder. Specify name using absolute or relative path names. You can use wildcards (*).