Capture File Details of all subjobs in Talend

Capture File Details of all subjobs in Talend - talend

I have a requirement to move files from SFTP to a local directory and it is simple.
I have used
tftpconnection-->tfileexists-->(if)tfileinput-->tmap--tfileoutput to move files and then i have other subjobs as tfileexists-->(if)tfileinput-->tmap--tfileoutput.
I have ten subjobs in the same job moving different files from different directories of SFTP.
I have to also capture all the file related details i.e size of file, number of rows and time of processing, source and destination.
Now i guess i can achieve this if i use
tfileproperties--->iterate--->tfilerowcount--->onComponentOk--->tfixedFlowInput
-- tfileoutput
However I just want one subjob that can capture all the details of the files.But in the above flow i have to hard code subjob file details i want to capture in the tfixedFlowInput which is not what i am looking for. Is there any way i can make this happen in a single subjob by dynamically changing file details in tFixedFlowInput.
Any guidance will be great-full.
Thanks.

You have two choice:
You can use the component tFileProperties (or tFTPFilePropertie).
You can use intern variable of the component tFileinput ( Windows -> Show view -> General -> Structure -> Your Component )
PS : If you just want to only move file without do any modifications you can use tfilecopy maybe.

Related

how to create a script that allows to use the path list as a reference for copying files in PowerShell in .bat script

I'm looking for a way to automate archiving where after I plug my two external drives I can copy all my resources. The problem is that I have different file structures on my laptop and on both external drives so I need to select specific folders to be copied. It means that I can't select one root folder and copy it straightforward. I tried to find a way to declare more than one path in the cp command and in the copy command, without success. An example path:
/my_programming_stuff
/folder1
/folder2
/folder3
/folder4
I want to select only the first 3 folders to copy them into external drive1 and external drive 2. The idea is to create a .bat file that will copy everything at once ( in the best case scenario it will be copied simultaneously on both external drives, so it will be much faster). Another problem is that there needs to be a bypass the ntfs long path limitations (max. 260 characters).
Flags that I want to use:
Copy the files and directories and all of their attributes,
including ownerships and permissions.
Recursively copy directories and their contents.
When copying files from one directory to another, only
copy files that either doesn't exist or are newer than the
existing corresponding files, in the destination
directory.
data verification (so it's certain that the copy was verified)
progression bar with time eta
Until now I was using Total Commander to do this but every day I need to pick only a few folders to be copied which takes time and is inefficient.
I have experience with Bash and PowerShell but I am not sure how to handle this topic.

Create a static batch file with robocopy commands. I think /copyall is the only switch you need to specify for all this. Other defaults should satisfy requirements.
https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/robocopy

I think your time will be better spent learning how to use either FastCopy or FreeFileSynce. I used FreeFileSync some years ago but got disgusted with the it's constantly changing format of its xml file used for starting a backup, so I switched to FastCopy. But it looks like FreeFileSync may be getting their act together and I aim to do some experiments over the summer to see if I want to switch back to it.
Both can handle the long filename format issues, both can be executed by a batch file, both seem to have a lot of quality, but FreeFileSync has more features - and more bloated because of the features. But speed wise, I think FastCopy is probably one of the better products out there and very streamline in use and design.

how to move file after reading the file in ibm datastage

i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.

The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.

you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before

Talend - Extract FileName from tLogRow/tSort

I am new to Talend and just trying to work my way through it.
Problem Statement
I need to process a positional file, from a list of files. Need to identify the latest file first and then process only that file. I was able to identify the most updated file. And then I was able to create another flow which processes the positional file. The problem is combining these two flows so that I am able to identify the most recent file and have just that one processed.
Tried so far
Have been trying to extract the most recent file from a list within a directory. Iterated through all the files, retained their properties in a buffer. Post completion of this sub-task, read through the buffer, sorted with descending mime, extracted the top record and was able to print it using tLogRow.
All seems to be fine except I don't know how to use the filename now for next task.
I am certain this is very rudimentary but I'll be honest, I've been scourging the internet/help from quite some time now, with no success.
Any pointers would help.
The job flow is attached for your reference.

First of all, you can simplify your job by using tFileList's capabilities. It can sort files by their modified date:
Next, use tIterateToFlow to convert each iteration to a row:
(String)globalMap.get("tFileList_1_CURRENT_FILEPATH")
and tSampleRow with a range of "1", to get the most recent file.
Then store the result in a global variable. In the next subjob, just use that global variable as your filename in tFileInputPositional.

Talend tWaitForFile insufficiency

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?

I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below:

How can I generate offline diff output between two views?

I am working on analyzing different files between two views in Clearcase. I need to generate output so that I can do this task without an internet connection. What I would like is to run a command that recursively walks through each view and generates a merge/diff output file for each change from view A to view B. This can work like a merge, except that I don't actually want to make any changes.
How can I set this up so that I can continue looking at diff output amidst all of these files while offline? I am using Clearcase, but if another tool can do the same work comparing two directory structures, that's fine, too.

"diff output amidst all of these files while offline?": it sounds as of you are using two dynamic views here (which do need a network connection with the ClearCase Vob server to display any file)
For a task like this, I switch to two snapshot views (which at least download the file on the local disk), and then use any file comparison tools out there (WinMerge, BeyondCompare, KDiff3).
Or I will use git (created directly within the ClearCase views) to compare the two directories.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse