Choosing values for constants - constants

One thing I've never really understood is why in many libraries, constants are defined like this:
public static final int DM_FILL_BACKGROUND = 0x2;
public static final int DM_FILL_PREVIOUS = 0x3;
public static final int TRANSPARENCY_MASK = 1 << 1;
public static final int TRANSPARENCY_PIXEL = 1 << 2;
What's up with the 0x and << stuff? Why aren't people just using ordinary integer values?

The bit shifting of 1 is usually for situations where you have non-exclusive values that you want to store.
For example, say you want to be able to draw lines on any side of a box. You define:
LEFT_SIDE = 1 << 0 # binary 0001 (1)
RIGHT_SIDE = 1 << 1 # binary 0010 (2)
TOP_SIDE = 1 << 2 # binary 0100 (4)
BOTTOM_SIDE = 1 << 3 # binary 1000 (8)
----
0111 (7) = LEFT_SIDE | RIGHT_SIDE | TOP_SIDE
Then you can combine them for multiple sides:
DrawBox (LEFT_SIDE | RIGHT_SIDE | TOP_SIDE) # Don't draw line on bottom.
The fact that they're using totally different bits means that they're independent of each other. By ORing them you get 1 | 2 | 4 which is equal to 7 and you can detect each individual bit with other boolean operations (see here and here for an explanation of these).
If they were defined as 1, 2, 3 and 4 then you'd probably either have to make one call for each side or you'd have to pass four different parameters, one per side. Otherwise you couldn't tell the difference between LEFT and RIGHT (1 + 2 = 3) and TOP (3), since both of them would be the same value (with a simple addition operation).
The 0x stuff is just hexadecimal numbers which are easier to see as binary bitmasks (each hexadecimal digit corresponds exactly with four binary digits. You'll tend to see patterns like 0x01, 0x02, 0x04, 0x08, 0x10, 0x20 and so on, since they're the equivalent of a single 1 bit moving towards the most significant bit position - those values are equivalent to binary 00000001, 00000010, 00000100, 00001000, 00010000, 00100000 and so on.
Aside: Once you get used to hex, you rarely have to worry about the 1 << n stuff. You can instantly recognise 0x4000 as binary 0100 0000 0000 0000. That's less obvious if you see the value 16384 in the code although some of us even recognise that :-)

Regarding << stuff: this in my preferred way.
When I need to define the constant with 1 in the bit 2 position, and 0 in all other bits, I can define it as 4, 0x4 or 1<<2. 1<<2 is more readable, to my opinion, and explains exactly the purpose of this constant.
BTW, all these ways give the same performance, since calculations are done at compile time.

Related

How to handle fields inside network headers whose length are not multiple of 8 bits

I just started learning socket programming, and I'm trying to implement TCP/UDP protocols using raw sockets.
IP Header
0 7 8 15 16 23 24 31
+--------+--------+--------+--------+
|Ver.|IHL|DSCP|ECN| Total length |
+--------+--------+--------+--------+
| Identification |Flags| Offset |
+--------+--------+--------+--------+
| TTL |Protocol| Header Checksum |
+--------+--------+--------+--------+
| Source IP address |
+--------+--------+--------+--------+
| Destination IP address |
+--------+--------+--------+--------+
When writing IP header, the Flags and Offset part, the length of Offset is not multiple of 8 bit. So I take Flags and Offset together as a whole.
uint8 flags = 0;
uint16 offset = htons(6000); // more than 1 byte, so we need to use htons
// in c, we can left(since it's in big endianness) shift offset 3 bit,
// and then convert flags to uint16, and then merge them together
// in some other languages, for example, Haskell,
// htons like functions may return a bytestring which is not an instance of Bit,
// we need to unpack it back into a list of uint8 in order to use bitwise operations.
This method is not very clean, I'm wondering what's the usual way to construct bytestring when its components are of length more than 1 byte and their endianness also needs to be considered.
In C, the usual way would be to declare a uint16_t, uint32_t, or uint64_t temporary variable, use bitwise operators to assemble the bits within that variable, and then use htons() or htonl() to convert the bits into network (aka big-endian) order.
For example, the Flags and Offset fields, taken together, constitute a 16-bit word. So:
uint8_t flags = /* some 3-bit value */;
uint16_t offset = /* some 13-bit value */;
uint16_t flagsAndOffsetBigEndian = htons(flags | (offset << 3));
memcpy(&header[16], &flagsAndOffsetBigEndian, sizeof(uint16_t));

unknown non-binary data encoding - any hints?

I' trying to decode data sent via RF by a weather station.
Unfortunately, the data representation isn't in standard binary way (0000, 0001, 0010, 0011, ...). What I've found is the following scheme:
value representation
0 => 0xff = 0b11111111
1 => 0x00 = 0b00000000
2 => 0x01 = 0b00000001
3 => 0xfe = 0b11111110
4 => 0x03 = 0b00000011
5 => 0xfc = 0b11111100
6 => 0xfd = 0b11111101
7 => 0x02 = 0b00000010
...
Or broken down to the bits:
value: 0 8 16 24
| | | |
Bit 0: 1010101010101010101010101010 ...
Bit 1: 1001100110011001100110011001
Bit 2: 1001011010010110100101101001
Bit 3: 1001011001101001100101100110
Bit 4: 1001011001101001011010011001
Bit 5: 1001011001101001011010011001
Bit 6: 1001011001101001011010011001
Bit 7: 1001011001101001011010011001
Each bit seems to follow a certain pattern of mirroring and inversion of the preceding, e.g. bit 3 = 10 01 0110 01101001
What is that kind of encoding called like, and how to easily convert it to a standard binary form?
It looks like the LSB pattern is periodic with period 2 (10 repeated), the next bit is periodic with period 4 (1001 repeated), and presumably the bit before that has period 8 (10010110 repeated).
This is somewhat similar to the normal representation, of course, except that usually the repeating patterns are 01, 0011, 00001111 etcetera.
It seems the pattern 1001 is created by copying 10 and inverting the second copy. Similarly, the pattern 100100110 is created by copying and inverting 1001. Hence, the next pattern of period 16 would be 10010011001101001.
Now, how are these patterns related?
For the lowest bit, 10 repeated is 01 repeated XOR (11). Simple.
For the next bit, 1001 repeated is 0011 XOR (1010) repeated - and note that the LSB pattern was 10 repeated.
After that, we get 10010110 repeated which is 00001111 XOR (10011001) repeated. See the pattern?
So: You need to XOR each bit with the bit to its right, starting from the MSB.

Store 2 4-bit numbers in 1 8 bit number

I am new in thinking about binary numbers. I'm wondering if there is a way to encode 2 4-bit numbers (i.e. hex-encoded numbers) into 1 8-bit number. So if I had a and 5 as the hex numbers, that would be 10 and 5. Maybe there is a way to store that in 1 8 bit number, in such a way that you can get it out of the 8-bit number back into its component 4-bit parts.
[10, 5]! = 15
15! = [10, 5]
Wondering if there is such a way to encode the numbers to accomplish this.
It seems like it is possible, because the first value could be stored in the first 16 digits, then the next value could be stored in the remaining, using 16 as 1, 32 as 2, 48 as 3, etc.
Can't tell if the answer here is how to do it:
How can i store 2 numbers in a 1 byte char?
Not really giving what I'd want:
> a = 10
10
> b = 5
5
> c = a + b
15
> d = (c & 0xF0) >> 4
0
> e = c & 0x0F
15
Maybe I'm not using it right, not sure. This seems like it could be it too but I am not quite sure how to accomplish this in JavaScript.
How to combine 2 4-bit unsigned numbers into 1 8-bit number in C
Any help would be greatly appreciated. Thank you!
I think the first post has the key.
Having a and 5 as the two 4-bit hex numbers to store, you can store them in a variable like:
var store = 0xa5;
or dynamically
var store = parseInt('0x' + ('a' + '9'), 16);
Then to extract the parts:
var number1 = ((store & 0xF0) >> 4).toString(16)
var number2 = ((store & 0x0F)).toString(16)
I hope this helps.
Yes this is supported in most programming languages. You have to do bitwise manipulation. The following is an example in Java.
To encode (validate input beforehand)
byte in1 = <valid input>, in2 = <valid input>;
byte out = in1<<4 | in2;
To decode:
byte in = <valid input>;
byte out1 = in>>4;
byte out2 = in & 0x0f;

How to do bitwise operation decently?

I'm doing analysis on binary data. Suppose I have two uint8 data values:
a = uint8(0xAB);
b = uint8(0xCD);
I want to take the lower two bits from a, and whole content from b, to make a 10 bit value. In C-style, it should be like:
(a[2:1] << 8) | b
I tried bitget:
bitget(a,2:-1:1)
But this just gave me separate [1, 1] logical type values, which is not a scalar, and cannot be used in the bitshift operation later.
My current solution is:
Make a|b (a or b):
temp1 = bitor(bitshift(uint16(a), 8), uint16(b));
Left shift six bits to get rid of the higher six bits from a:
temp2 = bitshift(temp1, 6);
Right shift six bits to get rid of lower zeros from the previous result:
temp3 = bitshift(temp2, -6);
Putting all these on one line:
result = bitshift(bitshift(bitor(bitshift(uint16(a), 8), uint16(b)), 6), -6);
This is doesn't seem efficient, right? I only want to get (a[2:1] << 8) | b, and it takes a long expression to get the value.
Please let me know if there's well-known solution for this problem.
Since you are using Octave, you can make use of bitpack and bitunpack:
octave> a = bitunpack (uint8 (0xAB))
a =
1 1 0 1 0 1 0 1
octave> B = bitunpack (uint8 (0xCD))
B =
1 0 1 1 0 0 1 1
Once you have them in this form, it's dead easy to do what you want:
octave> [B A(1:2)]
ans =
1 0 1 1 0 0 1 1 1 1
Then simply pad with zeros accordingly and pack it back into an integer:
octave> postpad ([B A(1:2)], 16, false)
ans =
1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0
octave> bitpack (ans, "uint16")
ans = 973
That or is equivalent to an addition when dealing with integers
result = bitshift(bi2de(bitget(a,1:2)),8) + b;
e.g
a = 01010111
b = 10010010
result = 00000011 100010010
= a[2]*2^9 + a[1]*2^8 + b
an alternative method could be
result = mod(a,2^x)*2^y + b;
where the x is the number of bits you want to extract from a and y is the number of bits of a and b, in your case:
result = mod(a,4)*256 + b;
an extra alternative solution close to the C solution:
result = bitor(bitshift(bitand(a,3), 8), b);
I think it is important to explain exactly what "(a[2:1] << 8) | b" is doing.
In assembly, referencing individual bits is a single operation. Assume all operations take the exact same time and "efficient" a[2:1] starts looking extremely inefficient.
The convenience statement actually does (a & 0x03).
If your compiler actually converts a uint8 to a uint16 based on how much it was shifted, this is not a 'free' operation, per se. Effectively, what your compiler will do is first clear the "memory" to the size of uint16 and then copy "a" into the location. This requires an extra step (clearing the "memory" (register)) that wouldn't normally be needed.
This means your statement actually is (uint16(a & 0x03) << 8) | uint16(b)
Now yes, because you're doing a power of two shift, you could just move a into AH, move b into AL, and AH by 0x03 and move it all out but that's a compiler optimization and not what your C code said to do.
The point is that directly translating that statement into matlab yields
bitor(bitshift(uint16(bitand(a,3)),8),uint16(b))
But, it should be noted that while it is not as TERSE as (a[2:1] << 8) | b, the number of "high level operations" is the same.
Note that all scripting languages are going to be very slow upon initiating each instruction, but will complete said instruction rapidly. The terse nature of Python isn't because "terse is better" but to create simple structures that the language can recognize so it can easily go into vectorized operations mode and start executing code very quickly.
The point here is that you have an "overhead" cost for calling bitand; but when operating on an array it will use SSE and that "overhead" is only paid once. The JIT (just in time) compiler, which optimizes script languages by reducing overhead calls and creating temporary machine code for currently executing sections of code MAY be able to recognize that the type checks for a chain of bitwise operations need only occur on the initial inputs, hence further reducing runtime.
Very high level languages are quite different (and frustrating) from high level languages such as C. You are giving up a large amount of control over code execution for ease of code production; whether matlab actually has implemented uint8 or if it is actually using a double and truncating it, you do not know. A bitwise operation on a native uint8 is extremely fast, but to convert from float to uint8, perform bitwise operation, and convert back is slow. (Historically, Matlab used doubles for everything and only rounded according to what 'type' you specified)
Even now, octave 4.0.3 has a compiled bitshift function that, for bitshift(ones('uint32'),-32) results in it wrapping back to 1. BRILLIANT! VHLL place you at the mercy of the language, it isn't about how terse or how verbose you write the code, it's how the blasted language decides to interpret it and execute machine level code. So instead of shifting, uint32(floor(ones / (2^32))) is actually FASTER and more accurate.

How do those bitmasks actually work?

For example, this method from NSCalendar takes a bitmask:
- (NSDate *)dateByAddingComponents:(NSDateComponents *)comps toDate:(NSDate *)date options:(NSUInteger)opts
So options can be like:
NSUInteger options = kCFCalendarUnitYear;
or like:
NSUInteger options = kCFCalendarUnitYear | kCFCalendarUnitMonth | kCFCalendarUnitDay;
What I don't get is, how is this actually done? I mean: How can they pull out those values which are merged into options? If I wanted to program something like this, that can take a bitmask, how would that look?
Bitmasks are pretty basic really. You can think of it like this (C# until somebody can convert):
public enum CalendarUnits
{
kCFCalendarUnitDay = 1, // 001 in binary
kCFCalendarUnitMonth = 2, // 010 in binary
kCFCalendarUnitYear = 4, // 100 in binary
}
You can then use the bitwise operators to combine the values:
// The following code will do the following
// 001 or 100 = 101
// So the value of options should be 5
NSUInteger options = kCFCalendarUnitDay | kCFCalendarUnitYear;
This technique is also often used in security routines:
public enum Priveledges
{
User = 1,
SuperUser = 2,
Admin = 4
}
// SuperUsers and Admins can Modify
// So this is set to 6 (110 binary)
public int modifySecurityLevel = SuperUser | Admin;
Then to check the security level, you can use the bitwise and to see if you have sufficient permission:
public int userLevel = 1;
public int adminLevel = 4;
// 001 and 110 = 000 so this user doesn't have security
if(modifySecurityLevel & userLevel == userLevel)
// but 100 and 110 = 100 so this user does
if(modifySecurityLevel & adminLevel == adminLevel)
// Allow the action
To do this, you want to bitwise AND the value you're testing against the mask, then see if the result of the ANDing equals the mask itself:
if ((options & kCFCalendarUnitYear) == kCFCalendarUnitYear) {
// do whatever
}
Bitmasks work because in binary, each power of 2 (i.e., 20=1, 21=2, 21=4) occupies a single spot in the sequence of bits. For example:
decimal | binary
1 | 0001
2 | 0010
4 | 0100
8 | 1000
When you or (the operator | in C-like languages) two numbers a and b together into c, you're saying "take the bits that are in a, b, or both and put them in c." Since a power of two represents a single position in a binary string, there's no overlap, and you can determine which ones were set. For example, if we or 2 and 4
0010 | 0100 = 0110
Notice how it basically combined the two. On the other hand, if we or 5 and 3:
decimal | binary
5 | 0101
3 | 0011
0101 | 0011 = 0111
notice that we have no way of telling which bits came from where, because there was overlap between the binary representation of each.
This becomes more apparent with one more example. Let's take the numbers 1, 2, and 4 (all powers of two)
0001 | 0010 | 0100 = 0111
This is the same result as 5 | 3! But since the original numbers are powers of two, we can tell uniquely where each bit came from.
The key is remembering that each one of those values you merge into "options" is really just a number. I'm not sure how familiar you are with binary, but you can think of it in decimal and just add numbers rather than ORing them.
Let's say A=10, B=100, and C=1000
If you wanted to set options = A+B, then options would equal 110. The method you called would then look at the "tens" place for A, the "hundreds" place for B, and the "thousands" place for C. In this example, there is a 1 is the hundreds place and the tens place, so the method would know that A and B were set in the options.
It's a little different since computers use binary not decimal, but I think the idea is very similar, and sometimes it's easier to think about it in a familiar numbering system.
typedef NS_OPTIONS(NSUInteger, MyOption)
{
OptionNone = 0,
OptionOne = 1 << 0,
OptionTwo = 1 << 1,
OptionThree = 1 << 2
};
if (givenValue & OptionOne) {
// bit one is selected
}
if (givenValue & OptionTwo) {
// bit two is selected
}
http://en.wikipedia.org/wiki/Mask_(computing)
I've found Calculator.app to be helpful in visualizing bit masks. (Just choose View > Programmer, and then click the button to Show Binary). (You can click on any of the 0s or 1s in the binary table to switch those bits on or off; or, enter numbers in decimal or hex (use the 8 | 10 | 16 NSSegmentedControl to switch between different representations)).