How to handle exception in spring batch for CSV -> DB? - spring-batch

I come across this link - How to handle exception and skip the wrong csv line as well?
The answer is marked as answered but don't understand it clearly.
I have the similar situation
a. processing csv file - let's say 100 records
b. first 10 are valid, next 10 are invalid data, next 10 are
valid data but business errors, next 10 are no business
error but DB size limit error
How to generate one single error csv file in all these cases where the error csv file should have the each csv row data which got failed, reason as one column, file name as another column, row number etc.,
I don't find a good example with respect to this context.
Can someone help me with one or explain the above link to handle the case?
General Link for Reference: https://docs.spring.io/spring-batch/docs/1.1.x/spring-batch-docs/reference/html/execution.html
Thanks.

Related

How to do duplicate file check in DataStage?

For instance
File A Loaded then next day
File B Loaded then next day
This time Again, File A received this time sequence should be abort
Can anyone help me out with this
Thanks
There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.
Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)
I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.
If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

Append datasets from xlsx and databse - Talend

I have three Excel files and one database connection which I need to append as a part of my flow. All four datasets in the pre-append stage have just one column.
When I try to use tUnite, I get the error for tFileInputExcel - see the screenshot. Moreover, I cannot join the database connection with tUnite.
What am I doing wrong?
I think the problem is with the tFileExist components (I think that's what these are on the left with the "if" links coming out) because each of them is trying to start a new flow. Once you're joining them with the unite, there can be only one start to the flow - and this goes to the start of the first branch of the merge order.
You can move the if logic elsewhere. Another idea is to put the output from each of the Excel into a tHashOutput (linked together), then use a tHashInput to write to your DB.

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.

Datastage is not reading the correct dataset

I have a datastage job that is comparring a previous dataset with a current one and loads the values into another dataset. The problem is that it does not search for the correct previous dataset and fails to see the physical file.
This happens to only a part of the files and is not a parameter problem as all the parameters are project level and set up correctly.
Is there a possibility of being overwriting? What type of error do you get? Was the dataset correctly created? Is the file system asociated full? There are many possibilities, that could be happening to you.

how to skip the header while processing message in Mirth

I am using Mirth 3.0. I am having a file having thousands of records. The txt file is having 3 liner header. I have to skip this header. How can I do this.
I am not suppose to use batch file option.
Thanks.
The answer is a real simple setting change which you need to make.
I think your input source data type is delimited text.
Go to your channel->Summary tab->Set data type->Source 1 Inbound Properties->Number of header record set it to 3.
What mirth will do is, to skip the first 3 line records from the file as they will be considered as headers.
If there is some method of identifying the header records in the file, you can add a source filter that uses a regular expression to identify and ignore those records.
Such result can be achieved using the Attachment script on the Summary tab. There you deal with a message in its raw format. Thus, if your file contains three lines of comments and then the first message starts with the MSH segment you may use regular JavaScript functions to subtract everything up to MSH. The same is true about the Preprocessor script, and it's even more logical to do such transformation there. The difference is that the Mirth does not store the message before it hits the Attachment handler, but it stores it before the Preprocessor handles the message.
Contrary to that, the source filter deals with the message serialized to the E4X XML object, where the serialization process may fail because of the header (it depends on the inbound message data type settings).
As a further reading I would recommend the "Unofficial Mirth Connect Developer's Guide".
(Disclaimer: I'm the author of this book.)
In my implementation the header content remains same so in advance I know how much lines the header is going to take so inside source filter I am using the following code.
delete msg["row"][1];delete msg["row"][1];return true;
I am using delete statement twice becoz after executing the first delete statement MSG will be having one less row and if header accommodates more that a single row then second delete statement is required.