How are bits arranged in a reduced-range variable in Ada? - range

Let's say I've created a type in Ada:
type Coord_Type is range -32 .. 31;
What can I expect the bits to look like in memory, or specifically when transmitting this value to another system?
I can think of two options.
One is that the full (default integer?) space is used for all variables of "Coord_Type", but only the values within the range are possible. If I assume 2s complement, then, the value 25 and -25 would be possible, but not 50 or -50:
0000 0000 0001 1001 ( 25)
1111 1111 1110 0111 (-25)
0000 0010 0011 0010 ( 50) X Not allowed
1111 1111 1100 1110 (-50) X Not allowed
The other option is that the space is compressed to only what is needed. (I chose a byte, but maybe even only 6 bits?) So with the above values, the bits might be arranged as such:
0000 0000 0001 1001 ( 25)
0000 0000 1110 0111 (-25)
0000 0000 0011 0010 ( 50) X Not allowed
0000 0000 1100 1110 (-50) X Not allowed
Essentially, does Ada further influence the storage of values beyond limiting what range is allowed in a variable space? Is this question, Endianness, and 2s complement even controlled by Ada?

When you declare the type like that, you leave it up to the compiler to choose the optimal layout for each architecture. You might even get binary-coded-decimal (BCD) instead of two's complement on some architectures.

Related

How to generate hash of arbitrary length with MurmurHash3 32 bit

I am currently trying to hash a set of strings using MurmurHash3, since 32 bit hash seems to be too large for me to handle. I wanted to reduce the number of bit used to generate hashes to around 24 bits. I already found some questions explaining how to reduce it to 16, 8, 4, 2 bit using XOR folding, but these are too few bits for my application.
Can somebody help me?
When you have a 32-bit hash, it's something like (with spaces for readability):
1101 0101 0101 0010 1010 0101 1110 1000
To get a 24-bit hash, you want to keep the lower order 24 bits. The notation for that will vary by language, but many languages use "x & 0xFFF" for a bit-wise AND operation with 0xFFF hex. That effectively does (with the AND logic applied to each vertical column of numbers, so 1 AND 1 is 1, and 0 and 1 is 0):
1101 0101 0101 0010 1010 0101 1110 1000 AND <-- hash value from above
0000 0000 1111 1111 1111 1111 1111 1111 <-- 0xFFF in binary
==========================================
0000 0000 0101 0010 1010 0101 1110 1000
You do waste a little randomness from your hash value though, which doesn't matter so much with a pretty decent hash like murmur32, but you can expect slightly reduced collisions if you instead further randomise the low-order bits using the high order bits you'd otherwise chop off. To do that, right-shift the high order bits and XOR them with lower-order bits (it doesn't really matter which). Again, a common notation for that is:
((x & 0xF000) >> 8) ^ x
...which can be read as: do a bitwise-AND to retrain only the most significant byte of x, then shift that right by 8 bits, then bitwise excluse-OR that with the original value of X. The result of the above expression then has bit 23 (counting from 0 as the least signficant bit) set if and only if one or other (but not both) of bits 23 and 31 were set in the value of x. Similarly, bit 22 is the XOR of bits 22 and 30. So it goes down to bit 16 which is the XOR of bit 16 and bit 24. Bits 0..15 remain the same as in the original value of x.
Yet another approach is to pick a prime number ever-so-slightly lower than 2^24-1, and mod (%) your 32-bit murmur hash value by that, which will mix in the high order bits even more effectively than the XOR above, but you'll obviously only get values up to the prime number - 1, and not all the way to 2^24-1.

Two RFID readers yield different IDs (not byte order difference)

I have a GEZE door reader for RFID tags. The web app shows for one RFID tag the number "0552717541244". When I read the same tag with a USB reader connected to my computer, it shows "0219281982".
The values in hex are d11fa3e and 80b0885f7c. So it does not seem to be the difference in byte order discussed in other similar questions.
Is there a way of finding out the longer number when only the shorter one is known?
How come one single tag can have two different identifiers?
Looking at only a single value pair makes it impossible to verify if there actually is some systematic translation scheme between the two values. However, looking at the binary representation of the two values gives the following:
decimal binary
0552717541244 -> 1000 0000 1011 0000 1000 1000 0101 1111 0111 1100
0219281982 -> 0000 1101 0001 0001 1111 1010 0011 1110
So it looks as if the web app reverses the bit order of each byte when compared to the reading of the USB reader and adds an additional byte 0x80 as the MSB:
decimal binary
0552717541244 -> 1000 0000 1011 0000 1000 1000 0101 1111 0111 1100
(added) --------> --------> --------> -------->
<-------- <-------- <-------- <--------
0219281982 -> 0000 1101 0001 0001 1111 1010 0011 1110

Address canonical form and pointer arithmetic

On AMD64 compliant architectures, addresses need to be in canonical form before being dereferenced.
From the Intel manual, section 3.3.7.1:
In 64-bit mode, an address is considered to be in canonical form if
address bits 63 through to the most-significant implemented bit by the
microarchitecture are set to either all ones or all zeros.
Now, the most significat implemented bit on current operating systems and architectures is the 47th bit. This leaves us with a 48-bit address space.
Especially when ASLR is enabled, user programs can expect to receive an address with the 47th bit set.
If optimizations such as pointer tagging are used and the upper bits are used to store information, the program must make sure the 48th to 63th bits are set back to whatever the 47th bit was before dereferencing the address.
But consider this code:
int main()
{
int* intArray = new int[100];
int* it = intArray;
// Fill the array with any value.
for (int i = 0; i < 100; i++)
{
*it = 20;
it++;
}
delete [] intArray;
return 0;
}
Now consider that intArray is, say:
0000 0000 0000 0000 0111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1111 1100
After setting it to intArray and increasing it once, and considering sizeof(int) == 4, it will become:
0000 0000 0000 0000 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
The 47th bit is in bold. What happens here is that the second pointer retrieved by pointer arithmetic is invalid because not in canonical form. The correct address should be:
1111 1111 1111 1111 1000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
How do programs deal with this? Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?
The canonical address rules mean there is a giant hole in the 64-bit virtual address space. 2^47-1 is not contiguous with the next valid address above it, so a single mmap won't include any of the unusable range of 64-bit addresses.
+----------+
| 2^64-1 | 0xffffffffffffffff
| ... |
| 2^64-2^47| 0xffff800000000000
+----------+
| |
| unusable | not to scale: this part is 2^16 times as large
| |
+----------+
| 2^47-1 | 0x00007fffffffffff
| ... |
| 0 | 0x0000000000000000
+----------+
Also most kernels reserve the high half of the canonical range for their own use. e.g. x86-64 Linux's memory map. User-space can only allocate in the contiguous low range anyway so the existence of the gap is irrelevant.
Is there a guarantee by the OS that you will never be allocated memory whose address range does not vary by the 47th bit?
Not exactly. The 48-bit address space supported by current hardware is an implementation detail. The canonical-address rules ensure that future systems can support more virtual address bits without breaking backwards compatibility to any significant degree.
At most, you'd just need a compat flag to have the OS not give the process any memory regions with high bits not all the same. (Like Linux's current MAP_32BIT flag for mmap, or a process-wide setting). That could support programs that used the high bits for tags and manually redid sign-extension.
Future hardware won't need to support any kind of flag to ignore high address bits or not, because junk in the high bits is currently an error. Intel 5-level paging adds another 9 virtual address bits, widening the canonical high andd low halves. white paper.
See also Why in 64bit the virtual address are 4 bits short (48bit long) compared with the physical address (52 bit long)?
Fun fact: Linux defaults to mapping the stack at the top of the lower range of valid addresses. (Related: Why does Linux favor 0x7f mappings?)
$ gdb /bin/ls
...
(gdb) b _start
Function "_start" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (_start) pending.
(gdb) r
Starting program: /bin/ls
Breakpoint 1, 0x00007ffff7dd9cd0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) p $rsp
$1 = (void *) 0x7fffffffd850
(gdb) exit
$ calc
2^47-1
0x7fffffffffff
(Modern GDB can use starti to break before the first user-space instruction executes instead of messing around with breakpoint commands.)

What kind of data can a Wiegand 26 reader read from an NFC card?

I can read an 13.56 MHz NFC card with my phone's NFC reader and I get a hexadecimal value like:
1BF52327
This represents the card UID or serial number.
What data can I expect from a Wiegand reader? Will it be able to read the same serial number?
As the Wiegand reader can read only 26 bits, what exact data will it read?
Update
I was able to test the above. I have used a HID SE r10 reader and a non-brand reader.
So here are the results.
This is the value of the above card (1BF52327) in binary that is ready by my phone's NFC:
11011111101010010001100100111
Next this is the value I get from a HID reader for the same card:
1101100011011100000010101110010000000000
This is the value I get from a non brand reader for the same card:
1101110000001010111001000
I can quickly find the correlation between the HID and non branded reader, in the end they are the almost the same.
But I cannot related the values read by Wiegand readers to the original value read by the NFC.
Any ideas on what I am doing wrong? I have used several libraries Joan, Wiegand-Protocol-Library-for-Arduino on an RPI and arduino and I get the same values from the Wiegand readers
Will the Wiegand reader be able to read the same serial number as the phone?
Wiegand readers for 13.56 MHz (more specifically ISO/IEC 14443 Type A) typically read the anti-collision identifier of cards/tags. The phone also seems to display the anti-collision identifier (UID) to you. So, yes, both devices read the same data element.
However, as you correctly found out, the reader only transmits a 26 bit value over the Wiegand interface (actually only 24 bits, since two of them are parity bits). As the UID has either 4 bytes, 7 bytes or 10 bytes, it needs to truncate the UID into a 3 byte value to transmit it over the Wiegand interface.
What data can I expect from a Wiegand reader?
Frames on the Wiegand interface look like this:
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 b12 b13 b14 b15 b16 b17 b18 b19 b20 b21 b22 b23 b24 b25
PE D23 D22 D21 D20 D19 D18 D17 D16 D15 D14 D13 D12 D11 D10 D9 D8 D7 D6 D5 D4 D3 D2 D1 D0 PO
The first line being the bits numbered as they arrive over the Wiegand wires. The second line being the same bits as they are interpreted by the receiver, where PE (b0) is an even parity bit over D23..D12 (b1..b12), PO (b25) is an odd parity bit over D11..D0 (b13..b24), and D23..D0 are the data bits representing an unsigned integer number (actually two, since the upper 8 bits are the site code and the lower 16 bits the tag ID).
Even though there is a logical split into site code and tag ID, these readers typically just use a truncated form of the tag ID as the 24 bit value.
How this value would map to the hexadecimal value you received on the phone strongly depends on how that hexadecimal representation was created (specifically its byte order). It might be as easy as just taking the last 3 bytes (F52327), but it could just as well be that it's 1BF523 (or any byte-reversed (or even bit-reversed) variation of that).
UPDATE: Regarding the values that you get for your readers...
First of all, you seem to have dropped leading zeros from the values. For instance, 1BF52327 is a 4-byte value and, consequently, has 32 bits:
1 B F 5 2 3 2 7
0001 1011 1111 0101 0010 0011 0010 0111
The same seems to be the case for the values received from the readers (either that or the library automatically dropped the leading parity bit or dropped both parity bits and added an arbitrary number(?) of zeros at the end of the values).
So your values are:
1101 1000 1101 1100 0000 1010 1110 0100 0000 0000
1101 1100 0000 1010 1110 0100 0
As you found out yourself, these clearly correlate in that one byte is missing at the beginning and that the value from the HID reader is filled with more zeros in the end.
Looking more closely, these values also correlate with the first binary value. The trick is to invert the values first. Thus, the values
1101 1000 1101 1100 0000 1010 1110 0100 0000 0000
1101 1100 0000 1010 1110 0100 0
become
0010 0111 0010 0011 1111 0101 0001 1011 1111 1111
0010 0011 1111 0101 0001 1011 1
For the value from the Wiegand reader, this would also fix the trailing odd parity bit (PO) since there are 7 '1' bits (incl PO) now (though this could just be coincidence).
You can now see that these values represent exactly the first value in reversed byte-order. If you reverse the byte-order of
1 B F 5 2 3 2 7
0001 1011 1111 0101 0010 0011 0010 0111
you get
2 7 2 3 F 5 1 B
0010 0111 0010 0011 1111 0101 0001 1011
Comparing that to the other two values, you see that they match:
0010 0111 0010 0011 1111 0101 0001 1011
0010 0111 0010 0011 1111 0101 0001 1011 1111 1111
0010 0011 1111 0101 0001 1011 1
Consequently, the value that you receive from the HID reader represents 2723F51B and the value that you receive from the Wiegand reader represents 23F51B. Hence, the byte 27 is truncated.

Virtual Memory Address in Binary form

Please help me out, im studying operating systems. under virtual memory i found this:
A user process generates a virtual address 11123456. and it is said the virtual address in binary form is 0001 0001 0001 0010 0011 0100 0101 0110. how was that converted because when i convert 11123456 to bin i get 0001 0101 0011 0111 0110 000 0000. it is said The virtual memory is implemented by paging, and the page size is 4096 bytes
You assume that 11123456 is a decimal number, while according to the result it's hexadecimal. In general, decimal numbers are rarely used in CS, representation in orders of 2 is much more common and convenient. Today mostly used are base 16 (hexadecimal) and 2 (binary).
Converting into binary may help to identify the page number and offset so that you can calculate the physical address corresponding to the logical address. It should be good if you can understand how to do this if you are CS student.
For the particular problem, i.e. paging, you can convert from logical to physical address without converting into binary using modulo (%) and divide (/) operators. However, doing things in binary is original way for this.
In your question, the value 11123456 should be a hexadecimal number and it should be written as 0x11123456 to distinguish with the decimal numbers. And from the binary format "0001 0001 0001 0010 0011 0100 0101 0110", we can infer that the offset of the logical address is "0100 0101 0110" (12 rightmost bits, or 132182 in decimal, or 0x20456 in hexadecimal) and the page number is "0001 0001 0001 0010 0011" (the rest bits, 69923 in decimal, or 0x11123 in hexadecimal).