I have problem with tFixedFlowInput and files name in Talend - talend

i would generate 1 file for each row in my main files
but when my job finished i have the rigth number of files but my file name is indented +1 one compared with the realy code in my file and the first file created is named "null0"
Screen of my problem
The Job

I would recommand using this option in tFileOutputDelimited

Related

How to generate a 10000 lines test file from original file with 10 lines?

I want to test an application with a file containing 10000 lines of records (plus header and footer lines). I have a test file with 10 lines now, so I want to duplicate these line 1000 times. I don't want to create a C# code in my app to generate that file (is only for test), so I am looking for a different and simple way to do that.
What kind of tool can I use to do that? CMD? Visual Studio/VS Code extension? Any thought?
If your data is textual, load the 10 records from your test file into an editor. Select all, copy, insert at the end of file. Repeat until the file is of length 10000+
This procedure requires ceil(log_2(1000)) cycles, 10 in your case, in general ceil(log_2(<target_number_of_lines>/<base_number_of_lines>)).
Alternative (large files)
Modern editors should not have performance problems here. However, the principle can be applied using a cat cli command. Assuming that you copy the original file into a file named dup0.txt proceed as follows:
cat dup0.txt dup0.txt >dup1.txt
cat dup1.txt dup1.txt >dup0.txt
leaving you with the quadrupled number of lines in dup0.txt.

merging multiple pdf files into one per file name using PDFtk pro

I have a situation that I need to merge files again by file names. Now, I have files in one folder like this -
A1.pdf,
A2.pdf,
B1.pdf,
C1.pdf,
C2.pdf,
C3.pdf.
The goal is to merge files by file names and I will get A.pdf, B.pdf, C.pdf. I tried different things in the batch file, but none worked so far. Can you please help?
The real files names are like this below.
115_11W0755_70258130_841618403_01.PDF
115_12W0332_70258122_202990692_01.PDF
115_12W0332_70258122_202990692_02.PDF
115_12W0332_70258122_202990692_03.PDF
115_14W0491_70258174_562605608_01.PDF
115_14W0491_70258174_562605608_02.PDF
115_14W0776_70258143_680477806_01.PDF
115_16W0061_70258083_942231888_01.PDF
115_16W0065_70258176_202990692_01.PDF
115_16W0065_70258176_202990692_02.PDF
the 3rd part (70258083) is the element that works as uinque per batch. In other words, I want to merge files per this element. from the file names listed above, there will be 6 PDF files.
I am using the batch script below to merge two files into one. I don't know how to tweak this to more than 2 files to merge OR leave a single file alone.
Please help.
setlocal enabledelayedexpansion
for %%# in (115_*.pdf) do (
set n=%%~n#
set n=!n:~,-30!
pdftk A=!n!.pdf B=%%# cat B A output C:\IN\_fileNames\Merge\Files\!n!.pdf
)
here is the error screen

how to move file after reading the file in ibm datastage

i have 1 folder which has 4 files, they are sales_jan, sales_feb, debt_jan, debt_feb.I created specific job for each sales and debt. The thing is, if i already run the job previously for sales_jan only and then there comes sales_feb after that, i dont wanna repeat reading the sales_jan again, i only want to read the newest file added that hasn't been processed. For reading the file, i pass the pattern of the specific file (ex. sales_*) but if i use it like that, then the stage will reprocessed the sales_jan again although it already has. I want to move the file already been read into another folder. How do i exactly do it in ibm datastage? if there's no way to do it, what's your suggestion for my problem. Any ideas would be appreciated.
The easiest solution is to use an after-job subroutine (ExecSH on Linux/UNIX, ExecDOS on Windows) to move the file to a different location.
Since you're using wildcards for the Sequential File stage, you're going to have to be a bit more clever in handling a situation where your job processes only some of the files. I would prefer to write this using a loop in a sequence, processing one file at a time, so that the move can be handled per-file.
you might make a flag for every file which already read by your job. For example add a maxdate field for each file. When the first file max date is less than the second file or new file Then read the latest file. It can be done by using simple linux command in sequence or tranformer. Just like Ray mentioned before

dspace command line import files not appearing in collection

I am running dspace 6.1 on Suse 12.2. I have used the command line import command to bring files into a collection, and the command prints out that it succeeded.
bin/dspace import --add --eperson=myemail#mycompany.com --
collection=123456789/49 --source=/opt/dspace/import_dateien/test/ --
mapfile=mapfile
Destination collections:
Owning Collection: 1982-11-10 Collection
Adding items from directory: /opt/dspace/import_dateien/test/
Generating mapfile: mapfile
Started: 1503332092481
Ended: 1503332095888
Elapsed time: 3 secs (3407 msecs)
I have added files without additional metadata, and no SAF zip file, because I want the extra files in the collection without their own metadata, only the existing set of metadata.
The name of the collection is correct in the success message. But I do not see the files in the collection. Are they there, but hidden? How can I get them to appear?
It appears that the command found zero items in the batch. You should see several lines of output for each item.
The first thing I would check is that the source directory "/opt/dspace/import_dateien/test/" conforms to SAF: it should contain numbered subdirectories, each containing one item. The importer seems to find no directories there.
I found it! I was using SAFBuilder, and there one has a csv file, with one line per file, each file with metadata. But, the SAFBuilder script creates a folder as well as a zip file, and in this folder are the files, a text file named contents, and the metadata as xml. One can simply add here the extra files one wants to add, list them in the contents file, and re-zip it. I just tested it with bin/dspace import, and there are my extra files in the collection. As #Mark Wood said, my files were listed in the output from the script.

Talend tWaitForFile insufficiency

We have a producer process that write files into a specific folder, which run continuously, we have to read files one by one using talend, there is 2 issues:
The 1st: tWaitForFile read only files which exist before its starting, so files which have created after the component starting are not visible for it.
The 2nd: There is no way to know if the file is released by the producer process, it may be read while it is not completely written, the parameter _wait_release_ of tWaitForFile does not work on Linux system !
So how can make Talend read complete written files from a directory that have an increasing files number ?
I'm not sure what you mean by your first issue. tWaitForFile has options to trigger when files are created, modified or deleted in a folder.
As for the second issue, your best bet here is for the file producer to be creating an OK or control file which is a 0 byte touch when it has finished writing the file you want.
In this case you simply look for the appearance of the OK file and then pick up the relevant completed file. If you name the 2 files the same but with a different file extension (the OK file is typically called ".OK" then this should be easy enough to look for. So you would set your tWaitForFile to look for "*.OK" files and then connect this to an iterate to a tFileInputDelimited (in the case you want to pick up a delimited text file) and then declare the file name as ((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).substring(0,((String)globalMap.get("tWaitForFile_1_CREATED_FILE")).length()-3) + ".txt"
I've included some screenshots to help you below: