How to parse/decode raw Netflow data to text in Java? - netflow

I want to convert raw Netflow traffic data to human readable format in java. Does any one have any clue how to achieve this.

Related

How best to store HTML alongside strings in Cloud Storage

I have a collection data of, and in each case there is chunk of HTML and a few strings, for example
html: <div>html...</div>, name string: html chunk 1, date string: 01-01-1999, location string: London, UK. I would like to store this information together as a single cloud storage object. Specifically, I am using Google Cloud Storage. There are two ways I can think of doing this. One is to store the strings as custom metadata, and the HTML as the actual file contents. The other is to store all the information as JSON file, with the HTML as a base64 encoded string.
I want to avoid a situation where after having stored a lot of data, I find there is some limitation to the approach I am using. What is the proper way to do this - is either of these approaches bad practice? Assuming there is no problem with either, I would go with the JSON approach because it is easier to pass around all the data together as a file.
There isn't a specific right way to do what you're talking about, there are potential pitfalls and performance criteria but they depend on what you're doing with the data and why. Do you ever need access to the metadata for queries? You won't be able to efficiently do that if you pack everything into one variable as a JSON object. What are you parsing the data with later? does it have built in support for JSON? Does it support something else? Is speed a consideration? Is cloud storage space a consideration? Does a user have the ability to input the html and could they potentially perform some sort of attack? How do you use the data when you retrieve it? How stable is the format of the data? You could use JSON, ProtocolBuffers, packed binary blobs in a length | value based format, base64 with a delimiter, zip files turned into binary blobs, do what suits your application and allows a clean structured design that you can test and maintain.

Using the CBOR format instead of JSON in elasticsearch ingest plugin

In the documentation of Ingest Attachment Processor Plugin in Elasticsearch, it is mentioned, "If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then." Could anyone please throw some light on this or maybe share an example of how to achieve this? I need to index a very large number of documents having a significant size. So I need to minimise the latency.

Heterogeneous Data Over TCP/IP

I need to send and receive heterogeneous data from a Matlab client to a server. The data includes 32-bit integers and 64-bit IEEE floats. Remember that TCP/IP only understands characters, so I need to pack this data together into a contiguous array to be clocked out. Then after receiving the response, I need to extract the byte data from the incoming character array and form it into Matlab types. Does anyone have any idea how to do this?
The generic term for turning heterogeneous data into a stream of bytes or characters is serializing (and the reverse, deserializing).
Two widely-used formats for serializing data into text characters are XML and JSON.
If you search the Mathworks site for any of those terms, or search this site for any of those terms together with [matlab] you'll find plenty of libraries and code examples.
Or since R2016b, MATLAB actually has built-in functions for serializing to / deserializing from JSON: jsonencode and jsondecode.

PostgreSQL protocol data representation format specification?

I am reading PostgreSQL protocol document. The document specifies message flow and containment format, but doesn't mention about how actual data fields are encoded in text/binary.
For the text format, there's no mention at all. What does this mean? Should I use just SQL value expressions? Or there's some extra documentation for this? If it's just SQL value expression, does this mean the server will parse them again?
And, which part of source code should I investigate to see how binary data is encoded?
Update
I read the manual again, and I found a mention about text format. So actually there is mention about text representation, and it was my fault that missing this paragraph.
The text representation of values is whatever strings are produced and
accepted by the input/output conversion functions for the particular
data type.
There are two possible data formats - text or binary. Default is a text format - that means, so there is only server <-> client encoding transformation (or nothing when client and server use same encoding). Text format is very simple - trivial - all result data is transformed to human readable text and it is send to client. Binary data like bytea are transformed to human readable text too - hex or Base64 encoding are used. Output is simple. There is nothing to describing in doc
postgres=# select current_date;
date
────────────
2013-10-27
(1 row)
In this case - server send string "2013-10-27" to client. First four bytes is length, others bytes are data.
Little bit difficult is input, because you can separate a data from queries - depends on what API you use. So if you use most simple API - then Postgres expect SQL statement with data together. Some complex API expected SQL statement and data separately.
On second hand a using of binary format is significantly difficult due wide different specific formats for any data type. Any PostgreSQL data type has a two functions - send and recv. These functions are used for sending data to output message stream and reading data from input message stream. Similar functions are for casting to/from plain text (out/in functions). Some clients drivers are able to cast from PostgreSQL binary format to host binary formats.
Some information:
libpq API http://www.postgresql.org/docs/9.3/static/libpq.html
you can look to PostgreSQL src to send/recv and out/in function - look on bytea or date implementation src/backend/utils/adt/date.c. Implementation of libpq is interesting too src/interfaces/libpq
-
The things closest to a spec of a PostgreSQL binary format I could find were the documentation and the source code of the "libpqtypes" library. I know, a terrible state of the documentation for such a huge product.
The text representation of values is whatever strings are produced and
accepted by the input/output conversion functions for the particular
data type. In the transmitted representation, there is no trailing
null character; the frontend must add one to received values if it
wants to process them as C strings. (The text format does not allow
embedded nulls, by the way.)
Binary representations for integers use network byte order (most
significant byte first). For other data types consult the
documentation or source code to learn about the binary representation.
Keep in mind that binary representations for complex data types might
change across server versions; the text format is usually the more
portable choice.
(quoted from the documentation, link)
So the binary protocol is not stable across versions, so you probably should treat it as an implementation detail and not use the binary representation. The text representation is AFAICT just the format of literals in SQL queries.

how to convert binary data into string format in j2me

I am using the gzip algorithm in j2me. After compressing the string I tried to send the compressed string as text message but the size was increasing drastically. So I used base64 encoding to convert the compressed binary to text. But while encoding the size is still increasing, please help me with an encoding technique which when used the data size remains the same.
I tried sending binary sms but as its limit is 134 characters I want to compress it before sending the sms.
You have some competing requirements here.
The fact you're considering using SMS as a transport mechanism makes me suspect that the data you have to send is quite short to start with.
Compression algorithms (in general) work best with large amounts of data and can end up creating a longer output than you started with if you start with something very short.
There are very few useful encoding changes that will leave you with output the same length as when you started. (I'm struggling to think of anything really useful right now.)
You may want to consider alternative transmission methods or alternatives to the compression techniques you have tried.