I am working on the big data spark requirement where I have to use tCacheOut and tCacheIn. Attached job screen shot is working fine but in one scenario when tCacheOut has nothing to store i.e. filter is not allowing to flow any row to next component, it throws null pointer exception.
I know, there are other alternatives like write output in disk and read again at the next step but I don't want to do that because disk read and write is always an overhead.
How we can handle null pointer exception in this case?
Suppose in tFileInputDelimited having 2 columns i.e. name and age. in tFilterRow i have two conditions like name not equals to NULL with AND operator name equals "John". i have connected tFilterRow component to tCacheOut with 'filter' flow.
Now connect a tFilterRow to tDie or tWarn with reject flow.
Related
I have a tMap component in a Talend job. The objective is to get a row from an input table, perform a column lookup in another input table, and write an output table populating one of the columns with the retrieved value (see screenshot below).
If the lookup is unsuccessful, I generate a row in an "invalid rows" table. This works fine however is not the solution I'm looking for.
Instead, I want to stop the entire process and throw an error on the first unsuccessful lookup. Is this possible in Talend? The error that is thrown should contain the value that failed the lookup.
UPDATE
A tfileoutputdelimited componenent would do the staff .
So ,the flow would be as such tMap ->invalid_row->tfileoutputdelimited -> tdie
Note : that you have to go to advanced settings in the tfileoutputdelimited component aand tick split output into multiple files option and put 1 rather then 1000
For more flexibility , simply do two tmap order than one tMap
I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.
i've got the following problem.
I have several tMaps, each has a lookup and at the end all the data is written in a db. The following mockup shall illustrate it:
There can be values in the main data stream which are not found in the lookup tables. For this values there is a rejected path which catches them from the specific tMap.
Requirements:
In case of a rejected inner join the looked up value shall be set to a default value (for example 0, which could be done in the schema of the tMap) and after that these "corrected" records should be added to the "normal" main data flow and process the next lookup.
The tUnite component is not able to handle this cases because it can not exist in a data flow loop.
Does anybody got an idea how to solve this problem?
Cheers.
The answer was so easy that i didn't got it in the first conception. I just have to change the join model from inner to left-join so all the formal rejected values will have a null value in it. Afterwards i can check the columns in the tmap and set them on a default value if they are null.
row1.id == null ? 0 : row1.id
Cheers.
If I understand correctly what you are trying to accomplish you will have to have staging files or staging tables on the database. Once you get the rejected rows, write them on a file or table. The accepted files will go also to a staging table(different than the rejected). Then you can union both tables or files by reading them. The key point is having a staging structure. I attach a picture what how would it be. In the picture the staging structure is a mysql table.
Let me know if it helps!
I want to execute some subjob if the previously processed number of rows are greater than N. To do this, i'm using the following configuration:
tFixedFlowInput have some rows.
tAggregateRow uses the count function and outputs one row with the number.
tSetGlobalVar then stores this value into a global var that I can check in the Run If connector (In this case, (Integer)globalMap.get("tSetGlobalVar_1") > 3 ).
tMsgBox then shows if the condition is true.
What I would like is to do the same, but in a more elegant way, using the minimum components required. I would like to connect tAggregateRow with the Run If connector directly (or even tFixedFlow) with tMsgBox, but I haven't found a way to refer to the number of rows previously processed without using the output row2.count variable.
How could I do something like this?
What should I put in the If condition to refer to the tAggregateRow operation result without connecting it to another meaningless component like exposed at the beginning?
for any talend component look under outline tab under the left side workspace pane at the bottom. this lists down the properties available via global variables for that component. Some properties like count of records inserted by output components are only available once the component is executed completely (After).
for your case you can try directly using ((Integer)globalMap.get("tFixedFlowInput_1_NB_LINE")) which gives number of lines (after) given by tFixedFlowInput.
I am currently working on a job something like this
The design is to,extract some data from customers,(say first name,last name) to one excel file,other data (say address) is to goto other excel file,i added a identity to tMap Numeric("s1",1,1) but it is starting from 1,3,5,7,9,11,13.... and on other excel it getting 2,4,6,8,10,12,...
but i need both excel to have same identity 1,2,3,4,5,6,....N
so that i can map the records
so can somebody guide me on this?
edit:
The autoincrement returns 1,2,3,4,5,6,... this is fine when thers only one tMap component in the job,but not similar when 2 tMaps are used ?
This is because the numeric sequence is static. Since you have only one sequence called "s1", it will be incremented twice at every iteration (one time for each tMap it's invoked in).
Just use some unique labels (ie. "s1" and "s2") to force the use of two independent sequences, thus the solution of your problem.