Netflow sample data sets - netflow

Does anyone know of an open netflow data set, I want to use it to run a little experiment on it, and analyse some of the flows. I looked around but there is nothing. Or if there is a good method to capture netflow data without actually having a cisco router.
Thanks!

You best/quickest option is to generate NetFlow through a software exporter that uses live capture (see for instance nProbe: http://www.ntop.org/products/netflow/nprobe/ or FlowTraq's free exporter: http://www.flowtraq.com/corporate/product/flow-exporter/).
Both these software exporters also have the capability to generate netflow from PCAP files. This can be convenient if you either have PCAP files, or download PCAP datasets, which are much more available than netflow datasets.

Related

How to append data to a compressed file?

I am getting a lot of data from websocket screem and I want to store them on disk. The amount of data received is ~300 MB per hour and I want to store this data long term (months, years).
In .NET there is a way how to read/write from/to zipped files using compressed streams. Is there a way to write directly to compressed file in Swift?
This is Mac OS (OSX) question.
Edit:
Stream compression here might be a solution but I am not used to work with unsafe pointers and don't even know whether it can be used to write to compressed file... I am stacked on this for few hours now. Code sample or directions how to approach it would help. Cocoapods wrapper for stream compression would be even better.
gzlog does what you're looking for. It is written in C and uses the zlib library. zlib is available on macOS, and you can link to C code from Swift.

how to find my netflow data version number?

Is there any option to know the version number of my netflow data.
I have pcap file generated using tcpdump. Then using some opensource (which depends on tshark) I converted the pcap data into netflow.
I am not able to find out which version of netflow it is in? netflow v5 or v7....or IPFIX.
Is there any way to tell netflow version by looking at the data?
If you are using the PCAP file to generate and export NetFlow over the wire, then the version number is in the second byte of the payload of the UDP packet. The value will be 5, 7, 9, or 'A' (in case of IPFIX).
However, if you have used a textual format to dump the records to disk, then they are technically not really versioned NetFlow until you export them somehow over the wire.

Nfcapd to pcap conversion?

I've got few NetFlow dumps captured by nfcapd deamon. Is there any possibility to convert them to .pcap format so I can analyse ones with my software?
Basically no; most of the information from the packets is lost, including the entire payloads. NetFlow summarizes the header information from all the packets in a given session: it could be a dozen or thousands. The NetFlow dumps do not (to my recollection) include partial updates either. So, you can go one way (convert from pcap to NetFlow) but not the other way.
That said, if all you need for your analysis are the IP headers of the first packets, you might be able to fake something. But I don't know of any tool that does it.

How can I find out what NetFlow version my nfcapd is capturing?

What version are my NetFlows?
I have an appliance that is exporting NetFlow to my NetFlow collector. My collector is collecting with nfcapd. The only information I can find is that nfcapd will capture different NetFlow versions "transparently".
My network appliance doesn't tell me in what version it is exporting flows. I need to explore a different NetFlow collector so I'm trying to get an idea of my requirements.
I could contact the vendor of the network appliance but I have several appliances exporting NetFlow so I would prefer to check on the collector end what version these flows are. Is there a way to do this with nfsen/nfcapd/nfdump tools? I'm not having any luck.
There are really only two versions that it's likely to be: NetFlow v5 or NetFlow v9 (IPFIX is essentially v9). The version number is included in the datagram, so the easiest way to find out which version it's exporting is to sniff the traffic in something like Wireshark, which will list the traffic as CFLOW. The first two bytes in each datagram will be the version number.

What are the real-time compute solutions that can take raw semistructured data as input?

Are there any technologies that can take raw semi-structured, schema-less big data input (say from HDFS or S3), perform near-real-time computation on it, and generate output that can be queried or plugged in to BI tools?
If not, is anyone at least working on it for release in the next year or two?
There are some solutions with big semistructured input and queried output, but they are usually
unique
expensive
secret enough
If you are able to avoid direct computations using neural networks or expert systems, you will be close enough to low latency system. All you need is a team of brilliant mathematicians to make a model of your problem, a team of programmers to realize it in code and some cash to buy servers and get needed input/output channels for them.
Have you taken a look at Splunk? We use it to analyze Windows Event Logs and Splunk does an excellent job indexing this information to allow for fast querying of any string that appears in the data.