I'm looking for a way how to split job execution in talend studio according to actual file row - I'd like to process file rows starting with "DEBUG" in one job branch and another rows in another job branch. It that possible?
To do this, use a tMap component. Your job will look like this
t*Input--row-->tMap--out1--->tFileOutput*
--out2--->tFileOutput*
In the tMap component, you have input on the left and output on the right. In your output table, select "Activate expression filter" and use the text box to define your filter-- only rows that match that filter will be ouput from that connection. You can have as many output tables and filters as you need.
Using tMap is cool, but if number of output stream is not defined and fixed, tMap is not a good choice.
In this case using iterate link or tjavaflex can help you:
Have a look at this tutorial on "how to split a file into many files regarding a key on each record" which explains how to solve this kind of task. It is actually only available in french. The tutorial shows 3 different technics to achieve this task.
Finally I used tExctractRegeFields component - simply defined regex for matching lines. The most important (and I didn't know before) is that you can connect components with different types of connections. I did right click on used component a chose Row > Reject for new branch in job as described in question.
We can do it by using tfileoutputdelimited and tfileinputdelimited.
We have one option in tfileoutputdelimited in advanced settings and check option split out files in several files.
Related
I have three Excel files and one database connection which I need to append as a part of my flow. All four datasets in the pre-append stage have just one column.
When I try to use tUnite, I get the error for tFileInputExcel - see the screenshot. Moreover, I cannot join the database connection with tUnite.
What am I doing wrong?
I think the problem is with the tFileExist components (I think that's what these are on the left with the "if" links coming out) because each of them is trying to start a new flow. Once you're joining them with the unite, there can be only one start to the flow - and this goes to the start of the first branch of the merge order.
You can move the if logic elsewhere. Another idea is to put the output from each of the Excel into a tHashOutput (linked together), then use a tHashInput to write to your DB.
using Talend Open Studio, I have a data-processing component, for which I'd appreciate your advice on how to make this possible (a) in a single component and (b) without a dirty workaround - thanks.
Relating part (a):
I have two different inputs:
One Input (with exactly one row) defines some kind of metadata for my processing.
One Input (with 1...n rows) defines the core data to process.
Currently, I solved this first requirement using two components and passing my metadata to the second component using the globalMap. But it would be nice, if I could integrate both connections into one component.
Relating part (b): After I have read all my input rows, I need to process them all at once. So far, so easy, I could use the end-section - my problem comes here: After that processing, I need to create a number of output-rows for a single output connection. Problem is, that Output-rows can only be created in the main-part and there I don't know when the last row was read...
Currently, I solved this counting the input-rows in advance and then, after that number is reached, I create that output. But this seems a really dirty workaround to me, so maybe someone has a solution for that, too?
Thank you for any useful tips!
I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.
I tried to display some results from several files in a directory. I use TFileList, and 2 tFileInputDelimited which are both linked to TFileList. I don't know why but at the end of the processing my results are lugged from just one of the 6 files I want. It appears that there are results from the list file of the directory.
Each tFileInputDelimited has ((String)globalMap.get("tFileList_1_CURRENT_FILEPATH")) as name of the flow.
Here is my TMap:
Your job is set up so your lookup is iterative which causes some issues as Talend only seems to use the last iteration rather than doing what you might expect and iterating through every step for everything it needs (although this might be more complicated than you first think).
One option is to rework the job so you use your iterate part of the job as the main input to the tMap rather than the lookup.
Alternatively, you could iterate the data into a tBufferOutput component and then OnSubjobOk you could link the job as before but replace the iterative part with a tBufferInput component as it will store all of the data from all of the files iterated through.
i'm creating some BIRT-Reports with Eclipse. Now i got the following problem.
I've got two datasets (Set one named diag, set two named risk). In my report i produce fpr every data in diag a region with an diag_id. Now i tried to use this diag_id as input parameter for the second dataset (risk). Is this possible, and how is this possible?
To link one dataset to another in BIRT, you can either:
Create a subreport within your report that links one dataset to another via an input parameter - see this Eclipse tutorial.
or:
Create a joint dataset that explicitly links the two datasets together - see the answer to this StackOverflow question.
Alternatively, if both datasets come from the same relational database, you could simply combine the two queries into a single query.
If you are using scripted data sources, you could use variables.
Add a variable through the Eclipse UI called "diag_id".
In the fetch script of diag, set diag_id:
vars["diag_id"] = ...; // store value in Variable.
Then, in the open script of risk, use the diag_id however you need to.
diag_id = vars["diag_id"];
This implies that placement of risk report elements are nested inside the diag repeating element so that diag.fetch will happen before each risk.open.