how to skip the header while processing message in Mirth - mirth

I am using Mirth 3.0. I am having a file having thousands of records. The txt file is having 3 liner header. I have to skip this header. How can I do this.
I am not suppose to use batch file option.
Thanks.

The answer is a real simple setting change which you need to make.
I think your input source data type is delimited text.
Go to your channel->Summary tab->Set data type->Source 1 Inbound Properties->Number of header record set it to 3.
What mirth will do is, to skip the first 3 line records from the file as they will be considered as headers.

If there is some method of identifying the header records in the file, you can add a source filter that uses a regular expression to identify and ignore those records.

Such result can be achieved using the Attachment script on the Summary tab. There you deal with a message in its raw format. Thus, if your file contains three lines of comments and then the first message starts with the MSH segment you may use regular JavaScript functions to subtract everything up to MSH. The same is true about the Preprocessor script, and it's even more logical to do such transformation there. The difference is that the Mirth does not store the message before it hits the Attachment handler, but it stores it before the Preprocessor handles the message.
Contrary to that, the source filter deals with the message serialized to the E4X XML object, where the serialization process may fail because of the header (it depends on the inbound message data type settings).
As a further reading I would recommend the "Unofficial Mirth Connect Developer's Guide".
(Disclaimer: I'm the author of this book.)

In my implementation the header content remains same so in advance I know how much lines the header is going to take so inside source filter I am using the following code.
delete msg["row"][1];delete msg["row"][1];return true;
I am using delete statement twice becoz after executing the first delete statement MSG will be having one less row and if header accommodates more that a single row then second delete statement is required.

Related

How to do duplicate file check in DataStage?

For instance
File A Loaded then next day
File B Loaded then next day
This time Again, File A received this time sequence should be abort
Can anyone help me out with this
Thanks
There are multiple ways to solve this, but please don't do intentionally aborts as they're most likely boomerangs.
Keep track of filenames and file hashes (like MD5sum) in a table and compare the list before loading. If the file is known, handle/ignore it.
Just read the file again as if it was new or updated. Compare old data with new data using the Change Capture stage, handle data as needed, e.g. write changed and new data to target. (recommended)
I would not recommend writing a sequence that "should abort" as this is not the goal of an ETL process. If the file contains the very same content that is already known, just ignore it. If it has updated data, handle it as needed. Only abort, if there is a technical issue, e.g. the file given is wrong formatted. An abort of a job should indicate that something is wrong with the job. When you get a file twice, then it's not the job that failed.
If an error was found in the data that needs to be fixed by others, write the information about it to a table. Have a another independend process monitoring that table to tell the data producer about it (via dashboard, email,...).

How to tell what file a packet came from after files merged

I am dealing with a large number of pcap files from numerous collection sources. I need to programmatically filter and I am using tshark for that, so I am merging all the files together first using mergecap. The problem with that is I also need collection point information which is only available in the capture file name. I tried using editpcap to add in per-packet comments specifying original file however that is untenable (see below for explanation). Any ideas how to track the original file after pcap files merged?
why editcap solution won't work
I considered using editcap to add per-packet comments on every packet before merging (How to add a comment to all packets in numerous pcap files before merging into a single file) however the problem with this approach is that editcap requires every packet comment to be individually specified on the command line (you can't specify a range of packets). Thats hundreds of thousands of comments and the command line won't support that. Additionally, if I try to run editpcap with just a few comments at a time over and over it rewrites the entire file every time, leading to thousands of file rewrites. Also not viable.
If your original capture files are in .pcapng format, then each one contains an Interface Description Block or IDB. When you run mergecap to merge them, you can specify that IDB's not be merged using the -I none option. In this way, the interface number will be unique per original file and you can add a column that shows that information to easily differentiate the source of each packet by interface ID, or you can apply a display filter to isolate only those packets from a particular capture file.
The filter or column to use would be the frame.interface_id field, but you could also filter by frame.interface_name or frame.interface_description if those field values all have different values too, but there's no guarantee those fields will be unique as the interface name and/or description might contain the same information, even if the capture files originate from different machines.

How to get the count of an Excel in unix or Linux in datastage

Is there any possibility i can get the count of an excel in unix or linux.
I tried creating server routine & i am able to get the output link count but i am unable to get return value into a file or variable.
Please suggest,
There are lots of information (besides the data itself) that can be extracted from an Excel file - check out the documentation
I could imageine two options for your goal:
check a certain cell which would be filled/exist if the file is not empty
check the filesize of an empty file and if the filesize is bigger than that it is not empty
The Best approach will be,
Use 'Unstructured Data' stage from File section in palette to read the xlsx(excel file).
Use Aggregator stage to take count of row , consider any column which will always be there in case the data comes into the file.
Write this count into a file.
use this as per your logical requirement.
After point 2 you can use transformer stage to handle count logic, whichever you find suitable.

Using Talend Open Studio DI to extract extract value from unique 1st row before continuing to process columns

I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.

Itemwriter output using same order that itemreader used to read file

We have a springbatch job that reads a file (flatfileitemreader), process it and writes data to a queue (jmsitemwriter).
We have another job that reads the queue (jmsitemreader) and writes a file (flatfileitemwriter). It's asynchronous process (in between the execution of the two jobs, we have some manual process that must be performed).
The flat file content doesn't have a line identifier. And we use a multi-threaded approach when reading the file ("throttle-limit"). So, the messages queued do not maintain the same order that they used to have into the flat file.
The problem is that we should generate an output file respecting the original order. So the line 33 inside the incoming file, should be line 33 into the outgoing file (it will have the contents of the original line, plus some data).
Does springbatch provide "native" a way to order the output, respecting the original read order? I used "native" because one solution that we thought is to create an additional step just to add a line number to the file and use it at the end, but I was wondering if this "reinvent the wheel"...
We are using SB 3.0.3
TIA,
Bob
The use case you are describing asks that you maintain order across multiple jobs which is not supported. In theory (while not guaranteed) a single, single threaded step would retain the order of the input file.
Since you are reading in a multithreaded manor, there really isn't a good way to guarantee the order of the items as they are being read. The best you could do is synchronize the read method and add an id as the items are being read. If the bottleneck you're attempting to address with multithreading is in the processor or writer, this may not be a bad option.