How to delete contents of the folder older then x days in Talend job?
I have thought about retrieving that attribute from rFilesList (could not find) or passing unix command to a system (less preferable way as you have less control).
Thank you!
P.S. The issue solved
You can use a "tfilelist" coupled with a "tfileproperties".
The variable mtime or mtime_string can help you.
Here there is a page that explain a little bit (but it's in french so you can google trad if you want) :
HERE
My solution (based on a link above):
tFileList->iterate->tFileProperties (reads the file from previous step, ((String) globalMap.get ( "tFileList_1_CURRENT_FILEPATH")) )-> tMap has 2 outputs, based on mtime condition:
Files to delete: (TalendDate.getCurrentDate().getTime()-row3.mtime)/(24*60*60*1000) > 2
Files to keep: (TalendDate.getCurrentDate().getTime()-row3.mtime)/(24*60*60*1000) <=2
tFileDelete that deletes filesToDelete.filename
Write a script to remove files over X days. and call the script from tSystem component :
More about tSystem : https://help.talend.com/display/TalendOpenStudioComponentsReferenceGuide521EN/19.4+tSystem
Related
i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.
The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.
you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before
I am working with a set of .dta files in Stata, each of which takes some time to create and each of which contains the date of creation in the file name (created at the point of saving using a macro with today's date).
At the moment my do-files identify the relevant .dta file to open based on the today's date macro, but this requires that I run the code to create the .dta files each day.
Is there a way of asking Stata to identify the most recently dated file from a set of files with same filename stem and different dates within a folder (and then open it), once I have run the "cd" command? I had a look on Statalist and SO but couldn't see an answer - any advice gratefully received.
e.g. In the folder, I have files 2020-08-23_datasetA.dta, 2020-08-22_datasetA.dta, 2020-08-22_datasetB.dta etc, and at different points I will want to select the most recently-dated version of A, B, C etc. Hence don't think a simple sort will work as there are datasets A, B, C at play.
(My question is essentially the Stata version of this one about R - Loading files with the most recent date in R)
[edited to clarify that there are multiple datasets, each of which is dated and each of which will need to be opened at different points]
Manifestly two or more files in a particular folder can't have the same name. But we know what you mean.
A utility like fs from SSC will return a list of filenames matching a pattern, alphanumerically sorted. With your dating convention the last named will be the latest as your dates follow (year, month, day) order.
Using another convention for the rest of the filename won't undermine that, but naturally you need to spell out which subset of files is of interest. So a pattern is
. ssc install fs
. fs *datasetA.dta
. local wanted = word(r(files), -1)
where the installation need only take place once. You can circumvent fs by using the calls to official Stata that it uses.
Perhaps you are seeking a program, but it's not clear to me that you need one.
Small detail: You're using the word macro in a way that doesn't match its Stata sense. Stata, for example, is not SAS! The terms code, routine and script are generic and don't conflict with any Stata use. Conversely, code, routine or script may have fixed meanings in other software you use. Either way, Stata questions are best phrased using Stata terms.
I am using the SetDateSave function in several places in my .nsi file
However once the installer is run the all files date modified/created field is updated to the current time.
Any ideas?? I know that commands in .nsi files will only effect every line below the command. I am guessing that there is some other command further down that is overriding the SetDateSave, but I need a second opinion please!
Update:
I think the nsisunz it the culprit! Testing ZipDLL instead. Will update answer here if I solve it!
It was the zip format that was resetting the date modified field!
Simply used ZipDLL instead of nsisunzip!
Hope this helps someone in the future :)
I would like to write WinSCP script to download a file that is placed onto the remote server every morning between 4-4:30am. Is there a way to do this with time-stamping?
I want to pseudocode:
get file.txt where timestap<1 hour from 4 am
First, I assume your file does not have fixed name (contrary to your question with fixed name file.txt). If not, please explain, why do you need timestamp-based solution.
Anyway, you can use a file mask with a time constraint:
get "*.txt>2014-07-19 4:00"
To dynamically inject today's date, use the %TIMESTAMP% syntax:
get "*.txt>%TIMESTAMP#yyyy-mm-ss% 4:00"
Simply, the above means, get all files created later than 4:00 today (the %TIMESTAMP#yyyy-mm-ss% resolves to today's date in format yyyy-mm-ss, as needed for the time constraint).
When passing the get on WinSCP command-line in a batch file (using /command switch, as opposite to using /script switch to specify a separate script file), you have to double the % to avoid the batch-file trying to interpret the %TIMESTAMP%:
winscp.com /command ... "get ""*.txt>%%TIMESTAMP#yyyy-mm-ss%% 4:00"""
Another solution is a static script that rely on a relative time: E.g. you know your script runs at 6am. So you let WinSCP download all files updated/created in the last 2 hours (6am – 4am):
get *.txt>2h
See also WinSCP article on downloading the most recent file.
Here's an open ended question. I work on a lot of mssql files, and I like to have a date stamp on each. This is so I can know just by looking at the source of a stored procedure whether it's up to date or not.
I'd like to have a shortcut autocomplete key, that, if i type say, d-tab-tab, I get the current date printed to the file. And yes, I am that lazy. :)
So the question is:
Is there any way of getting around this problem entirely?
If not, how would you suggest solving it?
Clever ideas welcome.
Are these files in source control? If so, see whether your source control provider allows templates within the source file which get filled in with the time and date when you check in.
If you use Notepad (and this is possibly the only argument for using it) then F5 does the trick.
What about using version control for your files and including automatic keyword expansion.
Using CVS Keyword Expansion you could put $Date: $ in the file and it will get replaced with the date of the last checkin. No typing or updating needed, it's "auto-magic".