Read contents of binary file (hex), knowing the format - matlab

On a laptop using Win10 x64, I have used a software that saves histogram data to a file with the '.dat' extension, and am trying to open it on Win7 x64.
In the software documentation, the file has the following format:
bytes 0 to 7 are a 64 bit floating point describing a number (upper bound of the diagram)
bytes 8 to 15 are a 64 bit floating point describing a number (bottom bound of the diagram)
bytes 16 to 19 are an unsigned 32 bit integer set to 0 or 1
bytes 20 to 23 are an unsigned 32 bit integer set to 0 or 1
bytes 24 to 27 are an unsigned 32 bit integer set to the width of the diagram, in pixels
bytes 28 to 31 are an unsigned 32 bit integer set to the height of the diagram, in pixels
bytes 32 to 800031 have an unsigned 32 bit integer for each point in the diagram, given the width and height. (so its 4*width*height = 800 000)
Now, if I open the file with a hex editor, I can see these values corresponding to the above numbered points:
'11 EA 2D 81 99 97 71 3D'
'49 AF BC 9A F2 D7 7A 3E'
'01 00 00 00'
'00 00 00 00'
'F4 01 00 00'
'90 01 00 00'
the rest of the file
From what I can see, looking at the value for 3, and knowing that should be either 1 or 0 I can see that it should actually read '00 00 00 01'. Reading a bit of info online, I think this is 'Little-endian'.
Writing the following in Matlab, on my Win7 laptop, [cinfo, maxsize, ordering] = computer, I get the answer 'L'. And the character encoding in Matlab is 'windows-1252'.
Testing the resulting values with various tools online I got the following:
Using http://babbage.cs.qc.cuny.edu/IEEE-754.old/64bit.html for 1.
and inputting the bytes as hexadecimal, little-endian (so, the value
of 3D719799812DEA11 instead of 11EA2D819997713D) I get the result of
1.000e-12.
Using the same url for 2. (so, the value of 3D719799812DEA11 instead
of 11EA2D819997713D), I get the result of 1.0001e-7 .
Using the same url to get the value little-endianed, then inputting
it into a site that converts unsigned int to binary, and another one
that converts binary to decimals, for 3., 4., 5. and 6, I got the
values of 1, 0, 500 and 400 respectively.
This means that the values for the above numbered points are:
1e-12
1e-7
1
0
500 (width, in pixels)
400 (height, in pixels)
4*500*400 = 800 000, so it seems correct.
So, I know what I should find if I try to open the file in Matlab.
This is what I did so far:
name = 'histogram.dat';
fid = fopen(name,'r');
v = fread(fid);
This gives for v as a 800032x1 double. If I open the v vector I see the following:
Column 1 to 7 (for 1.): [17;234;45;129;153;151;113]
Column 8 to 15 (for 2.): [61;73;175;188;154;242;215;122]
Column 16 to 19 (for 3.): [62;1;0;0]
Column 20 to 23 (for 4.): [0;0;0;0]
Column 24 to 27 (for 5.): [0;244;1;0]
Column 28 to 31 (for 6.): [0;144;1;0]
Column 32 to 8000032 (for 7.): the rest
Next, I deleted everything and then I tried to read the binary file in the following way:
name = 'histogram.dat';
fid = fopen(name,'r');
c1 = fread(fid,8,'float64');
c2 = fread(fid,8,'float64');
c3 = fread(fid,4,'uint32');
c4 = fread(fid,4,'uint32');
c5 = fread(fid,4,'uint32');
c6 = fread(fid,4,'uint32');
And now I get something totally different.
c1 = [1.00e-12; 1.00e-07; 4.90e-324; 8.48e-312; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314 ];
c2 = [-1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314];
c3 to c6 = [2.1475e09; 2.1475e09; 2.1475e09; 2.1475e09];
So, it appears not to work as expected. How can I open the file, in Matlab, and have the same values as I expect? Moreover, how can I open the bytes from 31 to the end? I suppose I read each 4 bytes up to the end, and get the value?
Also, if I get the HEX values for each vector, and I do a hex2num (for the first 2, the float64s) or a hex2dec (for the next ones, the uint32s), I get the correct results. Should I also hex2num or hex2dec first the binary numbers?
Another possibility is to open the file in a normal hex editor, get the big vector imported in Matlab, swap the byte ordering with swapbyes then do the hex2num/dec. Then I'll get the correct values.

The second argument to fread is the number of elements to read, not the number of bytes. It will adjust the number of bytes according to the datatype you specify. So for example, to read in a single float64 to c1 you actually want
c1 = fread(fid,1,'float64');
You need to adjust the other values accordingly. To read in the rest of the data to the end of the file, replace the second argument with Inf.
No need to convert to or from hex values; hex values are just about how the values are displayed on screen.

Also, you can read whole data in byte vector with
bytevec = fread(fid,inf,'uint8');
Then you can manually arrange elements of bytevec by their indices (or use reshape) and call
value = cast(bytevec(i1:i2), type);
And then convert it to default matlab double type without changing data:
double_value = typecast(value,'double');

Related

Looking for a Better Alternative to String Format for MQTT

I am mqtt-ing a string from a Rasbperry Pi(sitting in a field, supported by LTE internet, costing me 10$/500MB/month) to an MQTT broker. I am using paho-mqtt client in python to do this for me. The string looks something like "MM:DD:YYY HH:MM:SS, X1, X2, X3, , , , X24", and I am sending a new string every 30 seconds. X1 to Xn are floating point numbers 0 to 700 with 2 digit precision. I think this will cost me a lot of internet when I deploy it to 24/7 use. Is my data format good? What other data formats should I look at?
You can represent the Unix time with a 4-byte float. And you can represent a float with an IEEE754 float in 4 bytes. So your time and 24 floats can be packed into 100 bytes with Python struct.pack(). That looks like this:
import struct
import time
import random
# Synthesize some sample data - a time and 24 floats 0..700
data = [time.time()] + [ random.uniform(0, 700) for _ in range(24)]
# Pack as 25 IEEE754 floats of 4 bytes each
payload = struct.pack('!25f', *data)
print(len(payload)) # prints 100 (bytes)
Currently, you seem to be using:
19 bytes for your time and
around 7 bytes for each float including separators
So, that's around 180 bytes as you currently have it.
If you multiplied your floats by 100 and made them integer you could maybe encode as 16-bit unsigned values (i.e. half the space of a 4-byte float) which would go from 0..65535 to represent 0..655 which is close to your data range of 0..700. So that would be 4 bytes for the time, plus 24 samples of 2 bytes each, for a total of 52 bytes.
So, rather than 100, use 65535/700 or 93.62:
# Scale the data to the range 0..65535 and make into integers
smallerData = [data[0]] + [ int(93.62*data[i]) for i in range(1,25)]
payload = struct.pack('!f24H', *smallerData)
print(len(payload)) # prints 52 (bytes)
Obviously all the numbers above exclude MQTT protocol overhead.

Sum in range until value change

I'am trying to use this formula to make it work
=ARRAYFORMULA(IF(ISDATE_STRICT(S2:S) ; (MATCH(MAX(AB2:AB),AB2:AB;0)-1) ; "" ))
If there is a date in Column "S" I want it to display the sum of the blanks that would appear if in Column "S" is text
=ARRAYFORMULA(IF(ISDATE_STRICT(S2:S) ; ArrayFormula(MATCH(FALSE ; ISBLANK(AB2:AB) ; 0)-1) ; "" ))
I've tried this one as well but I only get 0's as a result.
Any idea how I can make it work?
Here is the sample sheet.
https://docs.google.com/spreadsheets/d/19f5phXeAwXwrKbWz7njgbznmurOav72GUuo_5IGcbls/edit?usp=sharing
in Q2 use:
=ARRAYFORMULA(IF(ISBLANK(
I1:INDEX(I:I; ROWS(I:I)-1));
{N2:INDEX(N:N; ROWS(N:N))\
I1:INDEX(N:N; ROWS(N:N)-1)};
I1:INDEX(O:O; ROWS(O:O)-1)))
in X2 use:
=INDEX(LAMBDA(x; IFNA(VLOOKUP(x; QUERY(VLOOKUP(ROW(x);
IF(ISDATE_STRICT(x); {ROW(x)\x}); 2; 1);
"select Col1,count(Col1) group by Col1"); 2; 0)-1))
(Q2:INDEX(Q:Q; MAX((Q:Q<>"")*ROW(Q:Q)))))
UPDATE:
we start with column Q. we can take a range Q2:Q but that range contains a lot of empty rows. the next best thing is to check the last non-empty row and set it as the end of the range resulting in Q2:Q73. but static 73 won't do in case the dataset would grow or shrink so to get 73 dynamically we take the MAX of multiplication of Q:Q not being empty and row number of that case eg. Q:Q<>"" will output only TRUE or FALSE so what we are getting is
...
TRUE * 72 = 1 * 72 = 72
TRUE * 73 = 1 * 73 = 73
FALSE * 74 = 0 * 74 = 0
...
so the formula for getting Q2:Q73 is:
=Q2:INDEX(Q:Q; MAX((Q:Q<>"")*ROW(Q:Q)))
it could also be:
=INDEX(INDIRECT("Q2:Q"&MAX((Q:Q<>"")*ROW(Q:Q))))
but it's just long to type... next, we use the new LAMBDA function that allows us to reference cell/range/formula with a placeholder. simple LAMBDA syntax is:
=LAMBDA(x; x)(A1)
where x is A1 and we can do whatever we want with the 2nd (x) argument of LAMBDA like for example:
=LAMBDA(a, a+a*120-a/a)(A1)
you can think of it as:
LAMBDA(A1, A1+A1*120-A1/A1)(A1)
or as just:
=A1+A1*120-A1/A1
the issue here is that we repeat A1 4 times but with LAMBDA we do it only once. also, imagine if we would have 100 characters long formula instead of A1 so the final formula with lambda would be 300 characters shorter compared to "old way" formula.
back to our formula... x is the representation of Q2:Q73. now let's focus on VLOOKUP. basically, the idea here is that IF Q column contains a date we return that date, otherwise we return the last date from above. simply put:
=ARRAYFORMULA(VLOOKUP(ROW(Q2:Q73);
IF(ISDATE_STRICT(Q2:Q73); {ROW(Q2:Q73)\Q2:Q73}); 2; 1))
as you can see Y2, Y3 and Y4 are the same so all we need to do is to count them up and later take away one to exclude Q2 but include just Q3 and Q4 eg. 3-1=2. for that we use simple QUERY where the output is:
date count
30.06.2022 3
so all we need to do is to pair up dates from Q column to QUERY output for that we use the outer VLOOKUP where the output is as follows:
3
#N/A
#N/A
9
#N/A
#N/A
...
now is the right time for that -1 correction while we have these errors coz ERROR-1=ERROR and 3-1=2 so after this -1 correction the output is:
2
#N/A
#N/A
8
#N/A
#N/A
...
and all we need to do now is to hide errors with IFERROR and the output is column X

Unambiguous binary encoding scheme for the alphabet

An old British Informatics Olympiad question (3c) asks what the smallest unambiguous encoding scheme for the alphabet (using only two symbols - hence binary) is. As far as I can see, the answer is 130 - 5 bits are required to store each letter, as 2^4 < 26. The alphabet has 26 characters, so the encoding scheme is 5*26 bits long. However, the mark scheme states that 124 bits can be used. What is the encoding scheme that is that long?
I think this works:
a - 0010
b - 0011
c - 0100
d - 0101
e - 0110
f - 0111
g - 10000
h - 10001
i - 10010
j - 10011
k - 10100
l - 10101
m - 10110
n - 10111
o - 11000
p - 11001
q - 11010
r - 11011
s - 11100
t - 11101
u - 11110
v - 11111
w - 00000
x - 00001
y - 00010
z - 00011
It is unambiguous. If a symbol starts with two or fewer zeros, it is of length 4. If it starts with a 1, it is length 5. If it starts with 000 then it is also length 5.
I got the idea by starting with a through h being length 4, using 0 as the first symbol. However, a scheme like that is short two symbols (if length is predicated entirely by the first symbol), so I looked for a way to reduce the number of four symbol codes by two... and noticed that 0000 and 0001 were the only two that had a triple0. Two bits give you four characters and the rest is an unambiguous encoding scheme :)
6 * 4 + 20 * 5 = 124
or alternatively
4 + 16 + 6 = 26
The trick here is to not use a fixed-length encoding (as you have pointed out, ld(26) is somewhere between 4 and 5, thus we have unused blocks in a 5-bit encoding scheme), but vary the length of our data words so we get an optimized number of bits for each leter.
When creating a table of the 32 combinations, we can assign the letters A-Z to each value, with A starting at 00000, B = 00001 and so on. Z will be 11001 – the rest (11010…11111) will be unused.
Now it gets a bit trickier. We have six combinations at the end which are not used, but we cannot simply drop them, as there is no such thing as "half a bit of information". Therefore, we need to distribute six combinations so that we can drop the last bit of each of them. Example:
10100 = U, 10101 = V
becomes
10100 = U, 10110 = V
The other combinations are moved accordingly so the last bit of each of the last six letters is a "0". Then this bit can be dropped, so we end with these letters:
00000 = A, 00001 = B, …, 10011 = T, 1010 = U, 1011 = V, 1100 = W, 1101 = X, 1110 = Y, Z = 1111
Important: While this scheme is prefix-free (i.e. no combination is the start of another, longer combination) and thus unambiguous, it is not self-synchronizing, so we cannot just sneak into a stream of encoded characters and definitely get a correct output. This would require having a synchronization "character" that is not contained in any other letter - but that is not possible as this is a no-redundancy scheme.

Transforming ciphertext from digital format to alphabetic format

Consider a message "STOP" which we are to encrypt using the RSA algorithm. The values given are p = 43, q = 59, n = pq, e = 13. At first I have transformed "STOP" into blocks of 4-bit code which are 1819 (S = 18 and T = 19) and 1415 (O = 14, P = 15) respectively (alphabets are numbered from 00 to 25).
Finally after calculation I have got 20812182 as the encrypted message (after combining 2081 and 2182). Is there any way to transform this digital code of the ciphertext to the alphabet form?
If we start by considering 2 bits, then 20 = U, 81 = ?, 21 = V, 82 = ?,what will be the alphabets for 81 and 82? I mean to ask,what will be the ciphertext for the plaintext "STOP" in the above case?
RSA works with numbers not binary data nor letters. You can of course convert one to another. E.g. this is what you did when you wrote 20812182. The number with that value can have an endless number of other representations.
Now creating an alphabetical representation that has a minimum size is pretty tricky to do. Basically you can divide by powers of 26. This is however not easy to implement. Instead you can take a subset of your alphabet and use that to represent your number.
To do this use your original number representation and replace 0 with A, 1 with B ... and 9 with J. This would result in CAIBCBIC for your ciphertext.
Note that plaintext and ciphertext are used as names for the input and output of cryptographic ciphers. Both names seem to indicate some kind of human readable text - and maybe they once did - but in cryptography they can be thought of as any kind of data.

how to create unique integer number from 3 different integers numbers(1 Oracle Long, 1 Date Field, 1 Short)

the thing is that, the 1st number is already ORACLE LONG,
second one a Date (SQL DATE, no timestamp info extra), the last one being a Short value in the range 1000-100'000.
how can I create sort of hash value that will be unique for each combination optimally?
string concatenation and converting to long later:
I don't want this, for example.
Day Month
12 1 --> 121
1 12 --> 121
When you have a few numeric values and need to have a single "unique" (that is, statistically improbable duplicate) value out of them you can usually use a formula like:
h = (a*P1 + b)*P2 + c
where P1 and P2 are either well-chosen numbers (e.g. if you know 'a' is always in the 1-31 range, you can use P1=32) or, when you know nothing particular about the allowable ranges of a,b,c best approach is to have P1 and P2 as big prime numbers (they have the least chance to generate values that collide).
For an optimal solution the math is a bit more complex than that, but using prime numbers you can usually have a decent solution.
For example, Java implementation for .hashCode() for an array (or a String) is something like:
h = 0;
for (int i = 0; i < a.length; ++i)
h = h * 31 + a[i];
Even though personally, I would have chosen a prime bigger than 31 as values inside a String can easily collide, since a delta of 31 places can be quite common, e.g.:
"BB".hashCode() == "Aa".hashCode() == 2122
Your
12 1 --> 121
1 12 --> 121
problem is easily fixed by zero-padding your input numbers to the maximum width expected for each input field.
For example, if the first field can range from 0 to 10000 and the second field can range from 0 to 100, your example becomes:
00012 001 --> 00012001
00001 012 --> 00001012
In python, you can use this:
#pip install pairing
import pairing as pf
n = [12,6,20,19]
print(n)
key = pf.pair(pf.pair(n[0],n[1]),
pf.pair(n[2], n[3]))
print(key)
m = [pf.depair(pf.depair(key)[0]),
pf.depair(pf.depair(key)[1])]
print(m)
Output is:
[12, 6, 20, 19]
477575
[(12, 6), (20, 19)]