I am dealing with a large number of pcap files from numerous collection sources. I need to programmatically filter and I am using tshark for that, so I am merging all the files together first using mergecap. The problem with that is I also need collection point information which is only available in the capture file name. I tried using editpcap to add in per-packet comments specifying original file however that is untenable (see below for explanation). Any ideas how to track the original file after pcap files merged?
why editcap solution won't work
I considered using editcap to add per-packet comments on every packet before merging (How to add a comment to all packets in numerous pcap files before merging into a single file) however the problem with this approach is that editcap requires every packet comment to be individually specified on the command line (you can't specify a range of packets). Thats hundreds of thousands of comments and the command line won't support that. Additionally, if I try to run editpcap with just a few comments at a time over and over it rewrites the entire file every time, leading to thousands of file rewrites. Also not viable.
If your original capture files are in .pcapng format, then each one contains an Interface Description Block or IDB. When you run mergecap to merge them, you can specify that IDB's not be merged using the -I none option. In this way, the interface number will be unique per original file and you can add a column that shows that information to easily differentiate the source of each packet by interface ID, or you can apply a display filter to isolate only those packets from a particular capture file.
The filter or column to use would be the frame.interface_id field, but you could also filter by frame.interface_name or frame.interface_description if those field values all have different values too, but there's no guarantee those fields will be unique as the interface name and/or description might contain the same information, even if the capture files originate from different machines.
Related
Could I know does it have any method to load multiple files that are multi schema delimited files which store in same directories in Talend?
I have tried use the tFileInputMSDelimited component before, but unable to link with tFilelist component to loop through the files inside the directory.
Does anyone have idea how to solve this problem?
To make clearer, each file only contain one batch line but contain multiple header line and it comes with a bunch of transaction line. As showing at the sample data below.
The component tFileOutputMSDelimited should suit your needs.
You will need multiple flows going into it.
You can either keep the files and read them or use tHashInput/tHashOutput to get the data directly.
Then you direct all the flows to the tFileOutputMSDelimited (example with tFixedFlowInput, adapt with your flows) :
In it, you can configure which flow is the parent flow containing your ID.
Then you can add the children flows and define the parent and the ID to recognize the rows in the parent flow :
I have a number of excel files where there is a line of text (and blank row) above the header row for the table.
What would be the best way to process the file so I can extract the text from that row AND include it as a column when appending multiple files? Is it possible without having to process each file twice?
Example
This file was created on machine A on 01/02/2013
Task|Quantity|ErrorRate
0102|4550|6 per minute
0103|4004|5 per minute
And end up with the data from multiple similar files
Task|Quantity|ErrorRate|Machine|Date
0102|4550|6 per minute|machine A|01/02/2013
0103|4004|5 per minute|machine A|01/02/2013
0467|1264|2 per minute|machine D|02/02/2013
I put together a small, crude sample of how it can be done. I call it crude because a. it is not dynamic, you can add more files to process but you need to know how many files in advance of building your job, and b. it shows the basic concept, but would require more work to suite your needs. For example, in my test files I simply have "MachineA" or "MachineB" in the first line. You will need to parse that data out to obtain the machine name and the date.
But here is how may sample works. Each Excel is setup as two inputs. For the header the tFileInput_Excel is configured to read only the first line while the body tFileInput_Excel is configured to start reading at line 4.
In the tMap they are combined (not joined) into the output schema. This is done for the Machine A Excel and Machine B excels, then those tMaps are combined with a tUnite for the final output.
As you can see in the log row the data is combined and includes the header info.
Hello I want to be able to compare values before and after form handling, so that I can process them before flush.
What I do is collect old values in an array before handlerequest.
I then compare new values to the old values in the array.
It works perfectly on simple variables, like strings for instance.
However I want to work on uploaded files. I am able to get their fullpath and names before handling the form but when I get the values after checking if form is valid, I am still getting the same old value.
I tried both calling $entity->getVar() and $form->getData()->getVar() and I have the same output....
Hello I actually found a solution. Yet it is a departure from the strategy announced in my question, which I realize is somewhat truncated regarding my objective. Which was to compare old file names and new names (those names actually include full path) for changes, so that I would unlink those of those old names that were not in the new name list anymore. Basically, to operate a cleanup after a file was uploaded to replace another, without the first one being deleted first. And to save the webmaster the hassle of having to sort between uniqid-named files that are still used by the web site and those that are useless.
Problem is that my upload functions, that are very similar to those given in examples to the file upload code shown on the official documentation pages, seemed to take effect at flush time.
So, since what I wanted to do with those files had nothing to do with database operations, I resorted to having step two code launch after flush, which works fine.
However I am intrigued by your solutions, as they are both strategies I hadn't thought of. Thank you for suggestions.
However I am not sure if cloning the whole object will be as straightforward as comparing two arrays of file names.
I am using Mirth 3.0. I am having a file having thousands of records. The txt file is having 3 liner header. I have to skip this header. How can I do this.
I am not suppose to use batch file option.
Thanks.
The answer is a real simple setting change which you need to make.
I think your input source data type is delimited text.
Go to your channel->Summary tab->Set data type->Source 1 Inbound Properties->Number of header record set it to 3.
What mirth will do is, to skip the first 3 line records from the file as they will be considered as headers.
If there is some method of identifying the header records in the file, you can add a source filter that uses a regular expression to identify and ignore those records.
Such result can be achieved using the Attachment script on the Summary tab. There you deal with a message in its raw format. Thus, if your file contains three lines of comments and then the first message starts with the MSH segment you may use regular JavaScript functions to subtract everything up to MSH. The same is true about the Preprocessor script, and it's even more logical to do such transformation there. The difference is that the Mirth does not store the message before it hits the Attachment handler, but it stores it before the Preprocessor handles the message.
Contrary to that, the source filter deals with the message serialized to the E4X XML object, where the serialization process may fail because of the header (it depends on the inbound message data type settings).
As a further reading I would recommend the "Unofficial Mirth Connect Developer's Guide".
(Disclaimer: I'm the author of this book.)
In my implementation the header content remains same so in advance I know how much lines the header is going to take so inside source filter I am using the following code.
delete msg["row"][1];delete msg["row"][1];return true;
I am using delete statement twice becoz after executing the first delete statement MSG will be having one less row and if header accommodates more that a single row then second delete statement is required.
I have a directory having a number of .sql files.I am able to execute them.But, I want to execute them in certain order.So, for example if I have 4 files xy.sql,dy.sql,trim.sql and see.sql.I want to execute them in see.sql,dy.sql,trim.sql and xy.sql.What happens now is I get a list of files using DirectoryInfo object.Now, I need to sort them using my order.I am using C# 3.5
thanks
It might be better to rename your files so that they sort into the correct order natively. This prevents having to maintain a separate "execution order" list somewhere.
Using a common prefix for the sql file names is a bit self-documenting as well, e.g.
exec1_see.sql
exec2_dy.sql
exec3_trim.sql and
exec4_xy.sql
It's unclear what the ordering algorithm is for your filenames. It doesn't appear to be entirely dictated by the filename (such as alphabetical ordering).
If you have arbitrary order in which you must execute these scripts, I would recommend that you create an additional file which defines that order, and use that to drive the ordering.
If the list of filenames is known at compile time, you could just hard-code that order in your code. I'm assuming, from your question, however, that the set of files is likely to change, and new ones may be added. If that's the case, I refer you to my previous paragraph.
You can obtain a List<string> that consists of all your .sql files, and call the List<T>.sort function to sort it and then operate on the files based on the newly sorted sequence.
number them and use a tool like my SimpleScriptRunner or the Tarantino DB change tool