Related
On a laptop using Win10 x64, I have used a software that saves histogram data to a file with the '.dat' extension, and am trying to open it on Win7 x64.
In the software documentation, the file has the following format:
bytes 0 to 7 are a 64 bit floating point describing a number (upper bound of the diagram)
bytes 8 to 15 are a 64 bit floating point describing a number (bottom bound of the diagram)
bytes 16 to 19 are an unsigned 32 bit integer set to 0 or 1
bytes 20 to 23 are an unsigned 32 bit integer set to 0 or 1
bytes 24 to 27 are an unsigned 32 bit integer set to the width of the diagram, in pixels
bytes 28 to 31 are an unsigned 32 bit integer set to the height of the diagram, in pixels
bytes 32 to 800031 have an unsigned 32 bit integer for each point in the diagram, given the width and height. (so its 4*width*height = 800 000)
Now, if I open the file with a hex editor, I can see these values corresponding to the above numbered points:
'11 EA 2D 81 99 97 71 3D'
'49 AF BC 9A F2 D7 7A 3E'
'01 00 00 00'
'00 00 00 00'
'F4 01 00 00'
'90 01 00 00'
the rest of the file
From what I can see, looking at the value for 3, and knowing that should be either 1 or 0 I can see that it should actually read '00 00 00 01'. Reading a bit of info online, I think this is 'Little-endian'.
Writing the following in Matlab, on my Win7 laptop, [cinfo, maxsize, ordering] = computer, I get the answer 'L'. And the character encoding in Matlab is 'windows-1252'.
Testing the resulting values with various tools online I got the following:
Using http://babbage.cs.qc.cuny.edu/IEEE-754.old/64bit.html for 1.
and inputting the bytes as hexadecimal, little-endian (so, the value
of 3D719799812DEA11 instead of 11EA2D819997713D) I get the result of
1.000e-12.
Using the same url for 2. (so, the value of 3D719799812DEA11 instead
of 11EA2D819997713D), I get the result of 1.0001e-7 .
Using the same url to get the value little-endianed, then inputting
it into a site that converts unsigned int to binary, and another one
that converts binary to decimals, for 3., 4., 5. and 6, I got the
values of 1, 0, 500 and 400 respectively.
This means that the values for the above numbered points are:
1e-12
1e-7
1
0
500 (width, in pixels)
400 (height, in pixels)
4*500*400 = 800 000, so it seems correct.
So, I know what I should find if I try to open the file in Matlab.
This is what I did so far:
name = 'histogram.dat';
fid = fopen(name,'r');
v = fread(fid);
This gives for v as a 800032x1 double. If I open the v vector I see the following:
Column 1 to 7 (for 1.): [17;234;45;129;153;151;113]
Column 8 to 15 (for 2.): [61;73;175;188;154;242;215;122]
Column 16 to 19 (for 3.): [62;1;0;0]
Column 20 to 23 (for 4.): [0;0;0;0]
Column 24 to 27 (for 5.): [0;244;1;0]
Column 28 to 31 (for 6.): [0;144;1;0]
Column 32 to 8000032 (for 7.): the rest
Next, I deleted everything and then I tried to read the binary file in the following way:
name = 'histogram.dat';
fid = fopen(name,'r');
c1 = fread(fid,8,'float64');
c2 = fread(fid,8,'float64');
c3 = fread(fid,4,'uint32');
c4 = fread(fid,4,'uint32');
c5 = fread(fid,4,'uint32');
c6 = fread(fid,4,'uint32');
And now I get something totally different.
c1 = [1.00e-12; 1.00e-07; 4.90e-324; 8.48e-312; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314 ];
c2 = [-1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314; -1.06e-314];
c3 to c6 = [2.1475e09; 2.1475e09; 2.1475e09; 2.1475e09];
So, it appears not to work as expected. How can I open the file, in Matlab, and have the same values as I expect? Moreover, how can I open the bytes from 31 to the end? I suppose I read each 4 bytes up to the end, and get the value?
Also, if I get the HEX values for each vector, and I do a hex2num (for the first 2, the float64s) or a hex2dec (for the next ones, the uint32s), I get the correct results. Should I also hex2num or hex2dec first the binary numbers?
Another possibility is to open the file in a normal hex editor, get the big vector imported in Matlab, swap the byte ordering with swapbyes then do the hex2num/dec. Then I'll get the correct values.
The second argument to fread is the number of elements to read, not the number of bytes. It will adjust the number of bytes according to the datatype you specify. So for example, to read in a single float64 to c1 you actually want
c1 = fread(fid,1,'float64');
You need to adjust the other values accordingly. To read in the rest of the data to the end of the file, replace the second argument with Inf.
No need to convert to or from hex values; hex values are just about how the values are displayed on screen.
Also, you can read whole data in byte vector with
bytevec = fread(fid,inf,'uint8');
Then you can manually arrange elements of bytevec by their indices (or use reshape) and call
value = cast(bytevec(i1:i2), type);
And then convert it to default matlab double type without changing data:
double_value = typecast(value,'double');
I am receiving EEG data from a 24 bit ADC over serial. The ADC data is transmitting in 3 bytes from MSB to LSB. The full packet is 21 bytes:
The first byte is the start byte - 0xFF (255 in decimal)
Then packet number byte.
Then the next 3 bytes are the 24 bit ADC value broken into MSB LSB2 LSB1
I can parse the data fine, but re-constructing a 2's complement signed int32 number is causing issues. The values I am getting out certainly don't reflect what the ADC should be giving out.
Below are the lines to read and parse the 504 samples (which gives me 24 ADC values (504samples/21bytes = 24 values)). I have tried uint8 instead of uchar with similar results (when I try int8 I get a invalid specified precision error).
comEEGSMT = serial(com,'BaudRate',3000000);
fopen(comEEGSMT);
rawData(1:504) = fread(comEEGSMT, 504, 'uchar');
fclose(comEEGSMT);
startPackets = find(rawData == 255);
bytes = rawData([startpackets+2 startpackets+3 startpackets+4]);
I have tried the following method to reconstruct the value:
ADC_value = bytes(:,1)*256^2 + bytes(:,2)*256 + bytes(:,3);
and the following line is the formula to convert the above number to volts:
ADC_value_volts = ADC_value*(5/3)*(1/(2^32));
The values are in the range of 4000 - 8000 microvolts with large jumps in value. The values SHOULD be in the range of 200 - 600 microvolts with small changes.
I have found other questions relating to similar issues, but have had no success trying the proposed solutions such as in the link below:
https://uk.mathworks.com/matlabcentral/answers/137965-concatenate-3-bytes-array-of-real-time-serial-data-into-single-precision
Any help would be very much appreciated as I've been stuck on this for quite long.
Thanks Mark
Starting with ADC_value as int32 with value 0, then:
ADC_value |= MSB << 16;
ADC_value |= LSB2 << 8;
ADC_value |= LSB1;
And then, to find out the corresponding volts value, supposing your ADC has a reference voltage VREF, in volts (e.g. 5.0V):
ADC_value_volts = (ADC_value * VREF)/2^24
since your converter is 24 bits, not 32.
Note the above expressions are in C language equivalent, not Matlab.
EDIT:
The ADC data sheet tell us the PGA gain can be set for the following values:
1, 2, 4, 6, 8, 12, 24, one value at the time for each channel.
The FSR (full scale range) of measurement is: (2*VREF)/Gain = 5/3, for Gain=6,
(eq.(5) page 23) so this must be accounted for in expression computing the volts
values. (these can be verified if you have access to the hardware and can make some
measurements).
Data resulted from ADC is already in two's complement, binary form, 24 bits.
The weird thing is the data sheet counts bits starting with 1, not 0, so this
is why shifting with "17" instead 16 - this is in fact 16 for coding.
(revealed in fig 47, page 42).
So the computing formula of ADC_value_volts should be:
ADC_value_volts = (AC_value * FSR/(2^23))/3 (1LSB=FSR/(2^23), pg.37)
If some other calculations/modifications from original, them these must be explained by provider.
If the provider is not friendly, worth to be changed...
I am working with OpenH264 Codec. OpenH264 is using Exp-Golomb Coding for header related information. I have studied several websites and gathered a little information about Exp-Golomb Coding. OpenH264 uses 4 types of Exp-Golomb coding methods. They are:
Ue [When values are only Non-Negative quantity]
Te [when values are only 1 or 0]
Se [when values are both negative and positive quantity]
Me [when values a standard code map is defined for values]
I have learnt how to Construct or Parse by Method Ue.
Syntax Format for Exp-Golomb(Ue) = [M-Zeros][1][INFO].
Construction: Suppose We have a Code_Num = 226.
Now,
M = floor(log2(Code_Num)) = floor(log2(226)) = 7
INFO = Code_Num + 1 - pow(2,M) = 226 + 1 - 128 = 99 = (1100011) in Binary
So,
CodeWord = 0000000 1 1100011 [M-zeros, 1 ignoring bit, INFO]
Parsing:
Suppose We have a CodeWord = 000000011100011
Code_Num = pow(2,M) + INFO - 1 = 128 + 99 - 1 = 226
Now I can calculate Exp-Golomb(Ue). But I want to learn all the theories related Se, Te and Me. But I am unable to find any resources for other methods. Please help me.
OpenH264 is an implementation of the H.264/AVC video codec.
AVC uses Exp-Golomb coding in it's various headers, all compatible encoders have to as well.
Also, te(v) stands for Truncated Exponential-golomb encoding.
Anyway, you can find information about reading signed Exponential-Golomb codes on the wiki page:
but a real quick tl;dr is the 0 = 1, 1 = 010, -1 = 011, etc.
as for this mess:
M = floor(log2(Code_Num)) = floor(log2(226)) = 7
INFO = Code_Num + 1 - pow(2,M) = 226 + 1 - 128 = 99 = (1100011) in Binary
So,
CodeWord = 0000000 1 1100011 [M-zeros, 1 ignoring bit, INFO]
That's not at all accurate, you're supposed to add 1 during encoding, and subtract 1 during decoding (for unsigned Exp-Golomb only), Signed Exp-Golomb uses a completely different system.
Edit:
Mapped Exp-Golomb is exactly the same as Unsigned Exp-Golomb, plus a table lookup.
Truncated Exp-Golomb is the same as standard RICE aka Unary coding, except the stop bit is 0.
If you don't feel like creating your own decoders/encoders, take a look at my project BitIO, because I've already written them, especially ReadRICE/WriteRICE, and ReadExpGolomb/WriteExpGolomb functions, BitIO on Github
the thing is that, the 1st number is already ORACLE LONG,
second one a Date (SQL DATE, no timestamp info extra), the last one being a Short value in the range 1000-100'000.
how can I create sort of hash value that will be unique for each combination optimally?
string concatenation and converting to long later:
I don't want this, for example.
Day Month
12 1 --> 121
1 12 --> 121
When you have a few numeric values and need to have a single "unique" (that is, statistically improbable duplicate) value out of them you can usually use a formula like:
h = (a*P1 + b)*P2 + c
where P1 and P2 are either well-chosen numbers (e.g. if you know 'a' is always in the 1-31 range, you can use P1=32) or, when you know nothing particular about the allowable ranges of a,b,c best approach is to have P1 and P2 as big prime numbers (they have the least chance to generate values that collide).
For an optimal solution the math is a bit more complex than that, but using prime numbers you can usually have a decent solution.
For example, Java implementation for .hashCode() for an array (or a String) is something like:
h = 0;
for (int i = 0; i < a.length; ++i)
h = h * 31 + a[i];
Even though personally, I would have chosen a prime bigger than 31 as values inside a String can easily collide, since a delta of 31 places can be quite common, e.g.:
"BB".hashCode() == "Aa".hashCode() == 2122
Your
12 1 --> 121
1 12 --> 121
problem is easily fixed by zero-padding your input numbers to the maximum width expected for each input field.
For example, if the first field can range from 0 to 10000 and the second field can range from 0 to 100, your example becomes:
00012 001 --> 00012001
00001 012 --> 00001012
In python, you can use this:
#pip install pairing
import pairing as pf
n = [12,6,20,19]
print(n)
key = pf.pair(pf.pair(n[0],n[1]),
pf.pair(n[2], n[3]))
print(key)
m = [pf.depair(pf.depair(key)[0]),
pf.depair(pf.depair(key)[1])]
print(m)
Output is:
[12, 6, 20, 19]
477575
[(12, 6), (20, 19)]
I am facing the problem of having several integers, and I have to generate one using them. For example.
Int 1: 14
Int 2: 4
Int 3: 8
Int 4: 4
Hash Sum: 43
I have some restriction in the values, the maximum value that and attribute can have is 30, the addition of all of them is always 30. And the attributes are always positive.
The key is that I want to generate the same hash sum for similar integers, for example if I have the integers, 14, 4, 10, 2 then I want to generate the same hash sum, in the case above 43. But of course if the integers are very different (4, 4, 2, 20) then I should have a different hash sum. Also it needs to be fast.
Ideally I would like that the output of the hash sum is between 0 and 512, and it should evenly distributed. With my restrictions I can have around 5K different possibilities, so what I would like to have is around 10 per bucket.
I am sure there are many algorithms that do this, but I could not find a way of googling this thing. Can anyone please post an algorithm to do this?.
Some more information
The whole thing with this is that those integers are attributes for a function. I want to store the values of the function in a table, but I do not have enough memory to store all the different options. That is why I want to generalize between similar attributes.
The reason why 10, 5, 15 are totally different from 5, 10, 15, it is because if you imagine this in 3d then both points are a totally different point
Some more information 2
Some answers try to solve the problem using hashing. But I do not think this is so complex. Thanks to one of the comments I have realized that this is a clustering algorithm problem. If we have only 3 attributes and we imagine the problem in 3d, what I just need is divide the space in blocks.
In fact this can be solved with rules of this type
if (att[0] < 5 && att[1] < 5 && att[2] < 5 && att[3] < 5)
Block = 21
if ( (5 < att[0] < 10) && (5 < att[1] < 10) && (5 < att[2] < 10) && (5 < att[3] < 10))
Block = 45
The problem is that I need a fast and a general way to generate those ifs I cannot write all the possibilities.
The simple solution:
Convert the integers to strings separated by commas, and hash the resulting string using a common hashing algorithm (md5, sha, etc).
If you really want to roll-your-own, I would do something like:
Generate large prime P
Generate random numbers 0 < a[i] < P (for each dimension you have)
To generate hash, calculate: sum(a[i] * x[i]) mod P
Given the inputs a, b, c, and d, each ranging in value from 0 to 30 (5 bits), the following will produce an number in the range of 0 to 255 (8 bits).
bucket = ((a & 0x18) << 3) | ((b & 0x18) << 1) | ((c & 0x18) >> 1) | ((d & 0x18) >> 3)
Whether the general approach is appropriate depends on how the question is interpreted. The 3 least significant bits are dropped, grouping 0-7 in the same set, 8-15 in the next, and so forth.
0-7,0-7,0-7,0-7 -> bucket 0
0-7,0-7,0-7,8-15 -> bucket 1
0-7,0-7,0-7,16-23 -> bucket 2
...
24-30,24-30,24-30,24-30 -> bucket 255
Trivially tested with:
for (int a = 0; a <= 30; a++)
for (int b = 0; b <= 30; b++)
for (int c = 0; c <= 30; c++)
for (int d = 0; d <= 30; d++) {
int bucket = ((a & 0x18) << 3) |
((b & 0x18) << 1) |
((c & 0x18) >> 1) |
((d & 0x18) >> 3);
printf("%d, %d, %d, %d -> %d\n",
a, b, c, d, bucket);
}
You want a hash function that depends on the order of inputs and where similar sets of numbers will generate the same hash? That is, you want 50 5 5 10 and 5 5 10 50 to generate different values, but you want 52 7 4 12 to generate the same hash as 50 5 5 10? A simple way to do something like this is:
long hash = 13;
for (int i = 0; i < array.length; i++) {
hash = hash * 37 + array[i] / 5;
}
This is imperfect, but should give you an idea of one way to implement what you want. It will treat the values 50 - 54 as the same value, but it will treat 49 and 50 as different values.
If you want the hash to be independent of the order of the inputs (so the hash of 5 10 20 and 20 10 5 are the same) then one way to do this is to sort the array of integers into ascending order before applying the hash. Another way would be to replace
hash = hash * 37 + array[i] / 5;
with
hash += array[i] / 5;
EDIT: Taking into account your comments in response to this answer, it sounds like my attempt above may serve your needs well enough. It won't be ideal, nor perfect. If you need high performance you have some research and experimentation to do.
To summarize, order is important, so 5 10 20 differs from 20 10 5. Also, you would ideally store each "vector" separately in your hash table, but to handle space limitations you want to store some groups of values in one table entry.
An ideal hash function would return a number evenly spread across the possible values based on your table size. Doing this right depends on the expected size of your table and on the number of and expected maximum value of the input vector values. If you can have negative values as "coordinate" values then this may affect how you compute your hash. If, given your range of input values and the hash function chosen, your maximum hash value is less than your hash table size, then you need to change the hash function to generate a larger hash value.
You might want to try using vectors to describe each number set as the hash value.
EDIT:
Since you're not describing why you want to not run the function itself, I'm guessing it's long running. Since you haven't described the breadth of the argument set.
If every value is expected then a full lookup table in a database might be faster.
If you're expecting repeated calls with the same arguments and little overall variation, then you could look at memoizing so only the first run for a argument set is expensive, and each additional request is fast, with less memory usage.
You would need to define what you mean by "similar". Hashes are generally designed to create unique results from unique input.
One approach would be to normalize your input and then generate a hash from the results.
Generating the same hash sum is called a collision, and is a bad thing for a hash to have. It makes it less useful.
If you want similar values to give the same output, you can divide the input by however close you want them to count. If the order makes a difference, use a different divisor for each number. The following function does what you describe:
int SqueezedSum( int a, int b, int c, int d )
{
return (a/11) + (b/7) + (c/5) + (d/3);
}
This is not a hash, but does what you describe.
You want to look into geometric hashing. In "standard" hashing you want
a short key
inverse resistance
collision resistance
With geometric hashing you susbtitute number 3 with something whihch is almost opposite; namely close initial values give close hash values.
Another way to view my problem is using the multidimesional scaling (MS). In MS we start with a matrix of items and what we want is assign a location of each item to an N dimensional space. Reducing in this way the number of dimensions.
http://en.wikipedia.org/wiki/Multidimensional_scaling