Talend tWaitForFile insufficiency - talend

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?

I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below:

Related

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.
Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy
I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

How to automatically delete Dymolas build files after simulation?

Every time I simulate in Dymola, a number of "useless" (for me) files are created in the working directory - i.e. dsfinal.txt, dsin.txt, dslog.txt, dsmodel.c, dymosim.exe. I find it annoying as it messes up my directory.
Is there a way to select only the desired output files to be kept after the simulations, without the need of manually deleting the undesired ones?
Those are temporary, but necessary files for Dymola. As far as I know there is no option to delete them automatically. Of course you could script that, but I don't see a real point to it and those files are used by some functionality - e.g. dsfinal.txt is used when as simulation is continued.
Some notes: Those files are created in the working directory - which should contain temporary files only. The working directory can be set via the GUI using File -> Options -> Settings:
A rather common problem is, that there is a Open and a Load function in Dymola:
As the description states, Load does not influence the working directory, whereas Open sets it to the directory from which a file is opened. The latter is also true for opening files e.g. via a double-click from the explorer. So usually it is better to go with Load.
My advice would be to separate the directories in which models/packages are stored and the working directory. This way the working directories content can be fully deleted basically anytime...

how to move file after reading the file in ibm datastage

i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.
The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.
you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before

dspace command line import files not appearing in collection

I am running dspace 6.1 on Suse 12.2. I have used the command line import command to bring files into a collection, and the command prints out that it succeeded.
bin/dspace import --add --eperson=myemail#mycompany.com --
collection=123456789/49 --source=/opt/dspace/import_dateien/test/ --
mapfile=mapfile
Destination collections:
Owning Collection: 1982-11-10 Collection
Adding items from directory: /opt/dspace/import_dateien/test/
Generating mapfile: mapfile
Started: 1503332092481
Ended: 1503332095888
Elapsed time: 3 secs (3407 msecs)
I have added files without additional metadata, and no SAF zip file, because I want the extra files in the collection without their own metadata, only the existing set of metadata.
The name of the collection is correct in the success message. But I do not see the files in the collection. Are they there, but hidden? How can I get them to appear?
It appears that the command found zero items in the batch. You should see several lines of output for each item.
The first thing I would check is that the source directory "/opt/dspace/import_dateien/test/" conforms to SAF: it should contain numbered subdirectories, each containing one item. The importer seems to find no directories there.
I found it! I was using SAFBuilder, and there one has a csv file, with one line per file, each file with metadata. But, the SAFBuilder script creates a folder as well as a zip file, and in this folder are the files, a text file named contents, and the metadata as xml. One can simply add here the extra files one wants to add, list them in the contents file, and re-zip it. I just tested it with bin/dspace import, and there are my extra files in the collection. As #Mark Wood said, my files were listed in the output from the script.

Talend writing file names from a list to a file for process completion

My job has following steps:
- Connect to ftp location
- Download compressed files
- Uncompress files to different folder
- Delete compressed files
- Write file names to a tracking file
ftpConnection -OnComponentOk--> ftpList-Iterate--> ftpGet -Iterate--> fileList-Iterate--> fileUnarchive-Iterate--> fileDelete
Question is where can i write the uncompressed filenames to the tracking file. When i try to Iterate from fileUnarchive to fileOutputDelimited it does not
allow me, similarly if i want to add a map from fileDelete it does not allow me. Do i need a map or can i make use of the global variable somehow?
One way i can do it getting it after ftpGet but i would prefer to do it at a latter stage (after unarchiving or deletion) so i don't update the file if the
process fails at one of these steps.
Thanks.
try with tfiledelete-->oncomponentok-->tfixedflowinput(here you can use the same global variable which contains current file name from tfilelist)-->(mainflow)-=->tfileoutputdelimeted...