Throw error on invalid lookup in Talend job that populates an output table - talend

I have a tMap component in a Talend job. The objective is to get a row from an input table, perform a column lookup in another input table, and write an output table populating one of the columns with the retrieved value (see screenshot below).
If the lookup is unsuccessful, I generate a row in an "invalid rows" table. This works fine however is not the solution I'm looking for.
Instead, I want to stop the entire process and throw an error on the first unsuccessful lookup. Is this possible in Talend? The error that is thrown should contain the value that failed the lookup.
UPDATE

A tfileoutputdelimited componenent would do the staff .
So ,the flow would be as such tMap ->invalid_row->tfileoutputdelimited -> tdie
Note : that you have to go to advanced settings in the tfileoutputdelimited component aand tick split output into multiple files option and put 1 rather then 1000
For more flexibility , simply do two tmap order than one tMap

Related

Capturing Each Skipped Record - Copy Data Activity

I'm doing a large-scale project with multiple pipelines, millions of records per pipeline. I'm trying to develop a generic skipped row capture process.
What I need to do is: for every source row skipped due to any error encountered on the attempted load, I want to capture a key column value from the row and write it to a distinct log file (or separate DB table row). This can't be summary data: for each individual row that fails, I need to capture the row key from that row so we can review/re-load later (I will add in system variable values to identify pipeline, component, time stamp, etc). Pipeline must complete with all successful rows loaded, all unsuccessful rows logged.
This is no-brainer functionality in most ETL tools; I have to be overlooking something in ADF, because I can't find a way to do this. Appreciate any/all suggestions.
You can enable Fault tolerance and choose Skip incompatible rows option. It will skip the incompatible rows between source and target store during copy data. e.g. type and field mismatch or PK violation.
Then you can enable session log and choose Warning log level in copy activity to log skipped rows. Finally, you can save your log file in Azure Storage or Azure Data Lake Storage Gen2.
Reference:
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-fault-tolerance
https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-log
With your first copy activity, check the fault tolerance option in 'settings' to log skipped fault rows.
Make sure to place your rows key column, as the first in the mapping definition.
Get the copy activity logFilePath from the activity output into a variable
Add another copy activity to load skipped rows into relational table
it source path will be the variable holds logFilePath
Set the file path type to: 'Wildcard file path'
Keep the 'Wildcard file path' empty
Will be the value in Wildcard file name
Make sure that the delimited file dataset escape character is set to quotations.
The OperationItem field of the lg file holds your record fields seperated by ,; because we placed the rowID first on mapping, it will appear first in OperationalItem as well.
Goodluck

tExtractRegexField unable to act as lookup to tMap in Talend DI

I have a tExtractRegexField which extracts a date from a string of text coming from a ExcelFileInput and will output the dates to tLogRow but I can't connect the same output as a lookup column to a tMap with a 2nd ExcelFileInput as its main input.
If I connect the ExtractRegexField to tMap first I can't then connect the 2nd ExcelFileInput and vis versa
I'm using Talend 6.3.1 and for testing I am able to connect 2 x ExcelFileInput to a tMap so I dont think its a problem with my system setup.
I have also tried tJoin instead of tMap but I encounter the same issue (can't connect both inputs together but can connect "A" or "B" first)
Overview of Process
Problem Area
The tExcelFileInput uses globalMap to get the path to the excel file from the preceding tFlowToIterate
Based on discussions on the talend forum the issue may have been down to a desire by taland DI to avoid circular references
An alternative solution is to extract the regexfield from the header row and store them in a global variable using a tJavaRow and globalMap.put("MyVal", row.Data); and then OnComponentOk read the remaining data from the body rows and in the tMap recall the global variable MyVal and include it as needed in your tMap Output

talend - put a logic before tOutput

I have a talend job that have the following components:
- 1 database input component that fetched data from the database
- 1 output component that writes the fetched data into flat file.
Now, I have a scenario wherein, from two fields on the fetched data, only one should be put on in the output based on some if else logic.
Can anyone assist me in this matter? Is it using tMap?
Your if/else logic could be put in the tMap; as a ternary operator in the output of your tMap.
condition?resultifOK:resultifKO
For example, you want to insert in your file the value of ColumnA if it is "MY CONDITION", otherwise you insert value of ColumnB :
You'll have, directly in the output row of tMap:
"MY CONDITION".equals(row1.ColumnA)?row1.ColumnA:row1.ColumnB
You can't directly use a if/else in tMap. If you have several conditions , consider using a Routine instead of multiple ternary operators.

How to get failed records while insertion in a CSV

I am inserting records from CSV to salesforce using talend tool. I want failed records in a separate CSV. Please provide me some solution?
Thanks!!!
Deactivate tSalesforceOutput / Advanced Settings / Extended Output (might result in slower performance)
Add another row with right click, Row / Rejects
Use this row in a csv component
Try this :
Create another tFileoutputdelimited
Go in tmap after create a new output in tmap
Go in the option in new output and put "catch output reject" to true.
Join the new output to the tfileoutputdelimited

How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

I am pretty new to Pentaho so my query might sound very novice.
I have written a transformation in which am using CSV file input step and table input step.
Steps I followed:
Initially, I created a parameter in transformation properties. The
parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step
in the following manner:
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?
(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?
TLDR;
I recommend using a "database join" step like in my third suggestion below.
See the last image for reference
First idea - Using Table Input as originally asked
Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.
First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.
Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.
Second idea, using a Database Lookup
This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.
Third idea, using a Database Join
That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query
ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.
Yes there's a way of assigning directly without the use of parameter. Do as follows.
Use Block this step until steps finish to halt the table input step till csv input step completes.
Following is how you configure each step.
Note:
Postgres query should be select * from person where EXTRACT(YEAR
FROM birthdate) > ?::integer
Check Execute for each row and Replace variables in in Table input step.
Select only the birthday column in CSV input step.