Extract packets from a large pcap file to create a small pcap file - pcap

I have a large pcap file (30G).
I also have a CSV file that contains a large number of flow IDs (more than 50000) in the format of
"source_address-destination_address-source_port-destination_port-protocol".
I want to extract packets from the pcap file according to the flow IDs in the CSV file, and create another pcap file.
I know Wireshark can filter packets based on IP address or port, but it can only filter one flow at a time.
I have to filter a larger number of flows (more than 50000) according to the 5-tuple in the CSV file. It looks like Wireshark cannot achieve it.
Is there an efficient way to filter the packets according to the flow ID stored in a CSV file and create a new pcap file?

Related

Improve speed of write/send vibration data for a Raspberry Pi

I would like some comments for speeding up writing and sending data for my vibration measurement system.
I have a system of 10 Raspberry Pi's (both rpi 3b+ and rpi 4), each with a triaxial accelerometer which acquires 3200 samples/second in each x,y,z-axis. Each minute a new CVS file is generated on each rpi, and recordings for the next minute is appended to this file, closed and send via ftp to a laptop where all data are gathered and post-processed.
Data within the CSV files are integers and integers only. No headers or similar. Filenames consist of a sensor number and an epoch timestamp, so all filenames are unique.
Should I change format, e.g. HDF5 (why - seems only to be good for larger data tables), or Panda PyTables, feather-format (appears only to utilize binear data - should I consider converting data?), Pickle, UFF58, ...?
Should I consider another approach instead of generating a new CSV file every minute, maybe not append but keep data in memory until they are needed for writing to a CSV file? (I need the data send to my laptop every minute). If so, how can this be done?
Other considerations as to improve I/O performance?

Extract over 250 fields from a pcap using tshark

I have captured wireless traffic using Wireshark and the captured pcap file is approximately 500MB. I'd like to extract more than 250 fields from that capture file. How can I do that with tshark?
You can use one of the following tshark commands to extract all fields from your capture file:
tshark -r input.pcap -T pdml
tshark -r input.pcap -T json
The number of fields you will have in the output really depends on the structure of your packets. You might have a large number of fields if you have several encapsulation layers, or a very small number if you don't have a recognized application layer.

how to find my netflow data version number?

Is there any option to know the version number of my netflow data.
I have pcap file generated using tcpdump. Then using some opensource (which depends on tshark) I converted the pcap data into netflow.
I am not able to find out which version of netflow it is in? netflow v5 or v7....or IPFIX.
Is there any way to tell netflow version by looking at the data?
If you are using the PCAP file to generate and export NetFlow over the wire, then the version number is in the second byte of the payload of the UDP packet. The value will be 5, 7, 9, or 'A' (in case of IPFIX).
However, if you have used a textual format to dump the records to disk, then they are technically not really versioned NetFlow until you export them somehow over the wire.

Nfcapd to pcap conversion?

I've got few NetFlow dumps captured by nfcapd deamon. Is there any possibility to convert them to .pcap format so I can analyse ones with my software?
Basically no; most of the information from the packets is lost, including the entire payloads. NetFlow summarizes the header information from all the packets in a given session: it could be a dozen or thousands. The NetFlow dumps do not (to my recollection) include partial updates either. So, you can go one way (convert from pcap to NetFlow) but not the other way.
That said, if all you need for your analysis are the IP headers of the first packets, you might be able to fake something. But I don't know of any tool that does it.

How to import big amount of data from file to sqlite inside the application (in real-time)

I have a big list of words (over 2 millions) in CSV file (size about 35MB).
I wanted to import the CSV file into sqlite3 with index (primiary key).
So I've imported it using sqlite command line tool. The DB has been created and size of the .sqlite file has grown to over 120MB! (50% because of primary key index)
And here we get the problem: if I add this 120MB .sqlite file to the resources even after compressing to .ipa file it has >60MB. And I'd like if it will be less then 30MB (because of the limitiation through E/3G).
Also because of the size I cannot import it (zipped sqlite file) by a web service (45MB * 1000 download = 45GB! it's my server's half year limit).
So I thought I could do something like this:
compress the CSV file with words to ZIP and than the file will have only 7MB file.
add ZIP file to resources.
in the application I can unzip the file and import data from the unzipped CSV file to sqlite.
But I don't know how to do this. I've tried to do this :
sqlite3_exec(sqlite3_database, ".import mydata.csv mytable", callback, 0, &errMsg);
but it doesn't work. The reason for the failure is ".import" is a part of the command line interface and not in the C API.
So I need to know how to import it(unzipped CSV file) to the SQLite file inside app (not during develompent using command line).
If the words that you are inserting are unique you could make the text the primary key.
If you only want to test whether words exist in a set (say for a spell checker), you could use an alternative data structure such as a bloom filter, which only requires 9.6 bits for each word with 1% false positives.
http://en.wikipedia.org/wiki/Bloom_filter
As FlightOfStairs mentioned depending on the requirements a bloom filter is one solution, if you need the full data another solution is to use a trie or radix tree data structure. You would preprocess your data and build these datastructures and then either put it in sqlite or some other external data format.
The simplest solution would be to write a CSV parser using NSScanner and insert the rows into the database one by one. That's actually a fairly easy job—you can find a complete CSV parser here.