How to get the endianness when reading a message from socket? - sockets

I am using go and protocol buffers. go program sends encoded protocol buffer messages to clients connected to the socket. Now since protocol-buffers is not delimited, clients don't know how much data to read from the socket.
I am planning to prefix the message with the message length, a 32bit integer. So clients can read 4 bytes, get the length and read the full message.
I can put an integer value into the bytes array using binary package. Something like,
binary.Write(buf, binary.LittleEndian, value)
Now the question is, write needs a byte order and how will the receiving end know what is the byte order? Is there a way to deal with this without specifying explicit byte order?

The convention is that network byte order is big endian, but your client and server must agree in any case. Specify the byte order explicitly.
Edit: Reading the Protobuf docs, it might be better to use proto.EncodeVarint and proto.DecodeVarint to write the length before the message, for the sake of consistency.

You should always explicitly define and document the byte order (and layout) on such things. In general communications, big-endian direct layout seems to be the norm (so 47 would be 4 bytes with hex values 00 00 00 2F). However, you might want to take specific context into consideration. For example, if you are already talking protobuf, you could also use the "varint" encoding format, in which case 47 is a single byte: 2F - many protobuf implementations will provide pre-rolled methods to consume or produce a varint (otherwise, it is documented here - but can be summarised as 7-bit payload with continuation bit, with the least-significant 7-bit group first)

Related

MQTT Encoding for Remaining Length Field. What is the need for Encoding this?

In MQTT Packets there is something called the remaining length field. This signifies the actual length of an MQTT Message.
The MQTT Documentation says that the remaining length is encoded by using something called "Continuation bit" Encoding Scheme.
My Question is this:
What is the need for "Encoding" the remaining length? Can't we just mention the length of the packet directly?
We can easily transmit the same length with fewer bytes. MQTT was designed to be lightweight right? Then why use this Encoding Scheme?
Please enlighten me. Or am I thinking about this in the wrong point of view?
Any help would be nice.
Thanks In Advance for your time.

What data type to store raw IMAP fetched email messages in Postgresql?

I need to store email messages as soon as they are fetched from IMAP in the database for later processing. I extract the message using a FETCH request and data is returned using BODY.PEEK[].
From my understanding, all IMAP messages are returned as US-ASCII (the mail servers accept only that), but I could be wrong.
My options (in order of what I think it's right) are:
US-ASCII text column
Bytea
BLOB
I was thinking about using US-ASCII but I'm afraid of having problems with encoding, I don't know if there are "faulty" IMAP servers not returning us-ascii mails.
The alternative is Bytea, but I read you have to deal with encoding, so I'm not sure what's the advantage/disadvantage over US-ASCII.
BLOB is raw, and I'm not sure about the problems it deliver in this case. I assume I have to deal with the bytes-to-string conversion.
What's the recommended data type?
For small objects such as emails, I think you're going to be better off with Bytea. The storage and handling is different and since your objects are going to be small, it seems like it would be handled better as Bytea. See here for a comparison of the two by Microolap. That's not a full answer to your question but might take one option off the list.
You're making the very much unwarranted assumption that you can avoid dealing with encodings.
You can't.
Whether you use lob, bytea, or a text column that you assume contains 7-bit mail only... the mail is just arbitrary binary data. You do not know its text encoding. In practice mail clients have used 8-bit encoding forever; either standards-compliant via MIME quoted-printable, or often simply raw 8-bit text.
Some clients have even been known to include full 8-bit MIME segments that include null (zero) bytes. PostgreSQL won't tolerate that in a text column.
But even for clients using compliant MIME, quoted-printable escaping text bodies, etc... the mail may contain non-ASCII chars, they're just escaped. Indexing these and ignoring the escapes will yield weird and wrong results. Also, attachments will usually be arbitrary base64 data. Indexing this as text is utterly meaningless. Then there's all the HTML bodies, multi-part/alternative segments, CSS, etc...
When dealing with email, assume that anything a client or server can do wrong, it will do wrong. For storage, treat the email as raw bytes of unknown encoding. That's exactly what bytea is for.
If you want to do anything with the mail you'll need a defensive MIME parser that can extract the MIME parts, cope with broken parts, etc. It'll need to check the declared encoding (if any) against the actual mime-part body, and guess encodings if none are declared or the declared encoding is obviously wrong. It'll have to deal with all sorts of bogus MIME structure and contents; quoted-printable bodies that aren't really quoted-printable, and all that.
So if you plan to index this email, it's definitely not as simple as "create a fulltext index and merrily carry on". The question with that is not if it will fail but when.
Personally, if I had to do this (and given the choice I wouldn't) I'd store the raw email as bytea. Then for search I'd decompose it into MIME parts, detect text-like parts, do encoding detection and dequoting, etc, and inject the decoded and cleaned up text bodies into a separate table for text indexing.
There are some useful Perl modules for this that you can possibly use via plperlu, but I'd likely do it in an outside script/tool instead. Then you have your choice of MIME processors, languages, etc.

Determine Remaining Bytes

I'm working on a project where I need to send a value between two pieces of hardware using CoDeSys. The comms system in use is CAN and is only capable of transmitting in Bytes, making the maximum value 255.
I need to send a value higher than 255, I'm capable of splitting this over more than one byte and reconstructing it on the receiving machine to get the original value.
I'm thinking I can divide the REAL value by 255 and if the result is over 1 then deconstruct the value in to one byte holding the remainders and one byte holding the amount of 255's in the whole number.
For example 355 would amount to one byte of 100 and another of 1.
Whilst I can describe this, I'm having a really hard time figuring out how to actually write this in logic.
Can anyone help here?
This is all handled for you in CoDeSys if I understand you correctly.
1. CAN - Yes it's in byte but you must not be using CANopen you are using the low level FB that ask you to send a CAN frame of an 8 byte array?
If it is your own two custom controllers ( you are programming both of them in CoDeSys) just use netvariables. Netvariables allows you to transfer any type of variable and you can take the variable list from one controller and import it to another controller and all the data will show up. You don't have to do any variable manipulation it's handle under the hood for you. But I don't know the specifics of your system and what you are trying to do.
If you are trying to deconstruct construct variables from one size to another that is easy and I can share that code with you.

Encode binary or ASCII (at least 7000 bytes) into image/barcode?

I'm wondering if there's some way to encode data (either binary or ASCII) into a printable image or data pattern that can easily be rescanned again and interpreted back into a file. The problem with QR codes is that they won't handle file sizes of 7-10KB. Any suggestions?
EDIT: One catch: Can't store said data on the server. Security reasons. The data must not exist anywhere except on a printed piece of paper.
7 kilobytes is 57344 bits, hence the graphical code needs a lot of bars or squares (in case of QR) in order to represent the data, and that's without thinking of data error correction, format information, positioning, alignment etc...
I think a sound solution will be to put the data on a server. map it with index and create a service to retrieve the data by index.
the QR/barcode will scan index and get the data from the service

Encrypt numeric data while preserving inequalities among ciphertexts

How to encrypt numeric data such that the cipher text produced by the encryption function is numeric, also Enc[m1] < Enc[m2] where m1 < m2.
I have gone through number of references all pointing to Format Preserving Encryption. However, no open source code implementation is available for it.
Is there a way (Encryption or Encoding) which can conceal the data with the aforementioned properties by using Java or C# ?
I want to encrypt numeric data within the range of [1 – 50] to cipher text within the range of [1000 - 5000]. I am trying to implement Secure Inverted Index mentioned in Enabling Search over Encrypted Multimedia Databases.
I think you've got a basic contradiction here. If you encrypt a number of values and somehow maintain the sort order among them, then someone, knowing that "abc" encrypts to 567 and "abe" encrypts to 569, will know that 568 => "abd". (Not that your encryption algorithm would be that naive, but you're seriously weakening anything you do manage to devise.)
Encrypting to a number is not difficult, if you allow the number to be longer than your cleartext. (After all, characters themselves are just numbers with special meaning.) A simple approach is to just decode the cyphertext into octal, but other techniques will produce slightly more compact representations of decimal digits.