I am held up with one of the task I need to perform to create CSV file with its name generated in run time and then Copy that same file and paste it to a different folder. I'm able to create the required file.
Here is what I've done till now:
In SSIS I'm taking a DFT in control flow and taking a view as my
OLEDB source, then pointing it to a Flat File destination and
creating a file in my desired location say folder x in a variable
i.e My_dest_folder for the variable I've created. Here are the steps I've followed.
My_dest_folder of type string and have given my folders path as the value.
Filename of type sting and gave a name say cv99351_ as the value.
Timestamp of type string and give the expression which generates a timestamp YYYYMMDDHHMISS format.
Archivefolder of type sting and gave another path where the generated file is supposed to be copied from My_dest_folder &
pasted into
archive folder.
In the connection string of my flat file connection manager, I have given the variables with
#My_dest_folder+#Filename+#Timestamp+".csv". which creates a file
with name cs99351_.csv in the folder x.
After the file is created I am trying to capture the filename from the My_dest_folder but since the timestamp also contains seconds I am not able to capture it everytime.
Can someone please help me out here? I would really appreciate it.
If someone want to save his files with SSIS, your description is already nice and could be use as a tutorial :)
But if I understand well you have a problem at the end of your process, when you try to get the filename generated.
To read it you use the same variable concatenation but sometimes your Timestamp can change and then you get an error (Your file doesn't exist)
If yes, I guess that you use a kind of GETDATE() function in the expression of your variable. It appears that SSIS will evalute the value of your variable each time you will request it.
I tested it:
I Ran 3 Insert statement and wait with the debugger between each.
It gave me 3 differents values:
I recommend you to not use your getdate() function in the variable expression.
You can retrieve it with a unique SQL Task (with a SELECT GETDATE() SQL Query) or with C# / VB method.
Does it solve your problem?
Regards,
Arnaud
I had a similar issue.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[$Project::v_FilePath] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_FileName]
The file was moved into the archive folder. To get the source file name , I was using the exactly same variable name as the destination variable name in step 2.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_ArchivedFileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_FileName]
Destination Variable
#[User::v_Archive_ArchivedFileName]
If the timestamp for step 2 and step 3 has even a second difference, there is an error as, pointed out above, that GETDATE() will evaluate each time when it is requested.
So the solution I came up with was swapping the Step 2 and Step 3.
There was a specific named file in the source folder.
#[User::v_Orig_FileName] : #[$Project::v_FilePath]+ #[$Project::v_FileName]
The file was moved into the archive folder.
#[User::v_Archive_Folder]: #[$Project::v_FilePath]+ "\\Archive"
#[User::v_Archive_OrigFileName] : #[User::v_Archive_Folder] +"\\"+ #[$Project::v_FileName]
Source Variable
#[User::v_Orig_FileName]
Destination Variable
#[User::v_Archive_OrigFileName]
It was renamed with the timestamp using GETDATE()
#[User::v_Archive_FileName] : #[User::v_Archive_Folder] +"\\"+REPLACE( REPLACE(REPLACE( #[$Project::v_FileName] , ".csv", Substring((DT_WSTR,50) GETDATE(),1,19)+".csv" ),":","")," ","_")
Source Variable
#[User::v_Archive_OrigFileName]
Destination Variable
#[User::v_Archive_FileName]
Hope this gives an idea to have a different spin on this issue.
Related
``
I have to copy files from source folder to target folder both are in the same storage account(ADL). The files in the source folder are of in .txt format and have date appended in the file name,
eg: RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
and
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(20221201 and 20221202 is date in file name , date format: yyyymmdd)
I have to create a pipeline that will sort and store files in the folders in ADL's in this hierarchy
ex: adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
even if we have multiple files on same date in file name based on that date in file name it has to create year(YYYY) folder and in year(YYYY) folder it should create month(MM) folder and in month(MM) folder it should create date(DD) folder like above example. Each File should copy into respective yyyy and respective mm and respective date folder.
What I have done:
In Get Metadata - Given argument to extract **childitems**
For each activity that contains a Copy activity.
In Copy activity source wildcard path is given as *.txt
for sink took concat expression using split and substring functions
Please check the screenshots of all activities and expressions
but this pipeline is creating the folders based on date in file name (like adl/2022/12/01)
but problem is it was copying all files into all date(DD) folders
(like adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt)
1.[GET META to extract child items](https://i.stack.imgur.com/GVYgZ.png)
2.[Giving GET META output to FOREACH](https://i.stack.imgur.com/cbo30.png)
3.[Inside FOREACH using COPY ](https://i.stack.imgur.com/U5LK5.png)
4.[Source Data Set](https://i.stack.imgur.com/hyzuC.png)
5.[Sink Data Set](https://i.stack.imgur.com/aiYYm.png) Expression used in Data Set in Folder Path '#concat('adl','/'dataset().FolderName)
6.[Took parameter for Sink](https://i.stack.imgur.com/QihZR.png)
7.[Sink in copy activity ](https://i.stack.imgur.com/4OzT5.png)
Expression used in sink for dynamic folders using split and substring function
#concat(substring(split(item().name,'.')[3],0,4),'/',
substring(split(item().name,'.')[3],4,2),'/',
substring(split(item().name,'.')[3],6,2)
)
**OUTPUT for this pipeline**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
**Required Output is**
adl/2022/12/01/RAINBOW.IND.EXPORT.20221201.WIFI.NETWORK.SCHOOL.txt
adl/2022/12/02/RAINBOW.IND.EXPORT.20221202.WIFI.NETWORK.SCHOOL.txt
(i.e each file should copy to respective date folders only even if we have multiple files in same date, they should copy to date folders based on date in file name)
I have reproduced the above and got same result when I followed the steps that you have given.
Copy activity did like this because, in source or sink you did not gave #item().name(file name for that particular iteration) and you have given *.txt in the wildcard path of source in copy activity.
It means for every iteration(for every file name) it copies all .txt files from source into that particular target folder(same happened for you).
To avoid this,
Give #item().name in source wild card file name
It means we are giving only one that iteration file name in the source for the copy.
(OR)
Keep the wildcard file name in source as it is(*.txt) and create a sink dataset parameter for file name.
and give #item().name to it in copy activity sink.
You can do any of the above and if you want you can do both at a time. I have checked all the 3 scenarios like
1.#item().name in wild card sink file name.
2. #item().name in dataset file name by keeping wildcard path same.
3. combining both 1 and 2(#item().name in wild card file name and in sink dataset parameter).
All are working fine and giving desired result.
There are two parts of my query
1) How to save different fields of structures as separate files(each file containing only named field of structure )?
2) Forcing save command to create directories in the save path when intermediate directories do not exist?
For first part:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields=fieldnames(data);
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct','data');
end
I want to save each field of struct data as a separate .mat file. So that after executing the loop, I should have 3 files inside E:\data viz. a.mat,b.mat and c.mat and a.mat contains only data of field 'a', b.mat contains only data of field 'b' and so on.
When I exeucte the above code, I get three files in my directory but each file contains identical content of all three variables a, b and c, instead of individual variables in each file.
Following command does not work:
for i=1:length(fields)
save(['E:\data\' fields{i} '.mat'],'-struct',['data.' fields{i} ]);
end
Error using save
The argument to -STRUCT must be the name of a scalar structure variable.
Is there some way to use save command to achieve my purpose without having to create temporary vaiables for saving each field?
For Second Part:
I have large number of files which need to stored in a directory structure. I want following to work.
test='abcdefgh';
save(['E:\data\' test(1:2) '\' test(3:4) '\' test(5:6) '\result.mat'])
But it showing following error
Error using save
Cannot create 'result.mat' because 'E:\data\ab\cd\ef' does not exist.
If any intermediate directory are not present, then they should be created by save command. I can get this part to work by checking if directory is present or not using exist command and then create directory using mkdir. I am wondering if there is some way to force save command to do the work using some argument I am not aware of.
Your field input argument to save is wrong. Per the documentation, the format is:
'-struct',structName,field1,...,fieldN
So the appropriate save syntax is:
data.a.name='a';
data.a.age=5;
data.b.name='b';
data.b.age=6;
data.c.name='c';
data.c.age=7;
fields = fieldnames(data);
for ii = 1:length(fields)
save(['E:\data\' fields{ii} '.mat'], '-struct', 'data', fields{ii});
end
And no, you cannot force save to generate the intermediate directories. Check for the existence of the save path first and create it if necessary.
I'm using MATLAB R2017a and I would like to create a new binary file within the folder the script is running from.
I am running matlab as administrator since otherwise it has no permissions to create a file. The following returns a legal fileID:
fileID = fopen('mat.bin','w');
but the file is created in c:\windows\system32.
I then tried the following to create the file within the folder I have the script in:
filePath=fullfile(mfilename('fullpath'), 'mat.bin');
fileID = fopen(filePath,'w');
but I'm getting an invalid fileId (equals to -1).
the variable filePath is equal in runtime to
'D:\Dropbox\Studies\CurrentSemester\ImageProcessing\Matlab
Exercies\Chapter1\Ex4\mat.bin'
which seems valid to me.
I'd appreciate help figuring out what to do
The problem is that mfilename returns the path including the file name (without the extension). From the documentation,
p = mfilename('fullpath') returns the full path and name of the file in which the call occurs, not including the filename extension.
To keep the path to the folder only, use fileparts, whose first output is precisely that. So, in your code, you should use
filePath = fullfile(fileparts(mfilename('fullpath')), 'mat.bin');
we are processing multiple files using external table. Is there any way I can get the file name being processed in external tables and stored it in database table?
Only workaround I can find is appending the file name to every record in the flat file which isn't ideal when huge dataset and multiple files.
Can anyone help on this
Thanks
No, the file name is simply never passed from the gpfdist daemon back to Greenplum. So you have to append the file name to each line - you can use gpfdist transformation for doing so
I was struggling with this as well, here's my solution. Please note I'm not an expert in linux, so there may be a one liner solution.
So I wanted to add a filename column in front of my records.
That can be done in sed, I've created a transform.sh file, with the following content:
#/bin/sh
filename=$1
#echo $filename >> transform.txt
sed -e "s|^|$filename\v|" $filename
Please note that I was using vertical tab as a delimiter, \v. Also in the filename you could have / hence using | . In order to have the value of $filename we have to use double quites for sed.
Test it, it looks good.
./transform.sh countersamples-2016-03-02--11-51-10.csv
countersamples-2016-03-02--11-51-10.csv
timestamp
machine
category
instance
name
value
countersamples-2016-03-02--11-51-10.csv
2016-03-02 11:51:10.064
DESKTOP-4PLQKVL
Memory
% Committed Bytes In Use
74.8485488891602
This part is done, lets continue with gpfdist. We need a yaml file that can be passed to gpfdist, I named this transform.yaml
Content:
---
VERSION: 1.0.0.1
TRANSFORMATIONS:
add_filename:
TYPE: input
CONTENT: data
COMMAND: /bin/bash transform.sh %filename%
Please note that we have the %filename% value here. It seems that gpfdist prefilters the files that needs to be handled, and passes them 1 by 1 to our transform.
Lets fire up gpfdist:
gpfdist -c transform.yaml -v
Now go into greenplum and create an external table such as:
CREATE READABLE EXTERNAL TABLE "ext_transform"
(
"filename" text,
"timestamp" timestamp without time zone ,
"machine" text ,
"category" text ,
"instance" text ,
"name" text ,
"value" double precision
)
LOCATION ('gpfdist://localhost:8080/*/countersamples*.csv#transform=add_filename')
FORMAT 'TEXT'
( HEADER DELIMITER '\013' NULL AS '\\N' ESCAPE AS '\\' )
And when we select data from it:
select * from "ext_transform";
We see:
I've created 2 folders to see how it reacts if the files are not in the same folder as the transform. This way I can distinguish between the 2 files, even if their data is identical.
In my talend job I have a context variable named context.TempFolder.
Now while copying data from sql table to excel file I need to create an Excel file named export.excel (fixed name) in to the folder specified by the variable context.TempFolder.
How do I specify the 'File Name' of my tFileOutputExcel component?
Here value of a context variable TempFolder might change but I will always be creating Excel file by same name export.excel
You just need to concatenate the context.TempFolder with your output file name.
So your file path for your tFileOutputExcel should look something like:
context.TempFolder + "export.excel.xls"
You can use vraiables and strings like this in a lot of places in Talend. To do something slightly more complicated, you might define the output file name in your job (so calculate it at run time) and then put that file name in the globalMap and then retrieve it when you output your file so you might end up with something like:
context.OutputFolder + (String)globalMap.get("FileName") + ".xls"
This is useful for date-time stamping files for example. Or maybe defining the file name by some sort of data in your input.