Methods for Exp-Golomb CodeWord Construction and Parsing - encoding

I am working with OpenH264 Codec. OpenH264 is using Exp-Golomb Coding for header related information. I have studied several websites and gathered a little information about Exp-Golomb Coding. OpenH264 uses 4 types of Exp-Golomb coding methods. They are:
Ue [When values are only Non-Negative quantity]
Te [when values are only 1 or 0]
Se [when values are both negative and positive quantity]
Me [when values a standard code map is defined for values]
I have learnt how to Construct or Parse by Method Ue.
Syntax Format for Exp-Golomb(Ue) = [M-Zeros][1][INFO].
Construction: Suppose We have a Code_Num = 226.
Now,
M = floor(log2(Code_Num)) = floor(log2(226)) = 7
INFO = Code_Num + 1 - pow(2,M) = 226 + 1 - 128 = 99 = (1100011) in Binary
So,
CodeWord = 0000000 1 1100011 [M-zeros, 1 ignoring bit, INFO]
Parsing:
Suppose We have a CodeWord = 000000011100011
Code_Num = pow(2,M) + INFO - 1 = 128 + 99 - 1 = 226
Now I can calculate Exp-Golomb(Ue). But I want to learn all the theories related Se, Te and Me. But I am unable to find any resources for other methods. Please help me.

OpenH264 is an implementation of the H.264/AVC video codec.
AVC uses Exp-Golomb coding in it's various headers, all compatible encoders have to as well.
Also, te(v) stands for Truncated Exponential-golomb encoding.
Anyway, you can find information about reading signed Exponential-Golomb codes on the wiki page:
but a real quick tl;dr is the 0 = 1, 1 = 010, -1 = 011, etc.
as for this mess:
M = floor(log2(Code_Num)) = floor(log2(226)) = 7
INFO = Code_Num + 1 - pow(2,M) = 226 + 1 - 128 = 99 = (1100011) in Binary
So,
CodeWord = 0000000 1 1100011 [M-zeros, 1 ignoring bit, INFO]
That's not at all accurate, you're supposed to add 1 during encoding, and subtract 1 during decoding (for unsigned Exp-Golomb only), Signed Exp-Golomb uses a completely different system.
Edit:
Mapped Exp-Golomb is exactly the same as Unsigned Exp-Golomb, plus a table lookup.
Truncated Exp-Golomb is the same as standard RICE aka Unary coding, except the stop bit is 0.
If you don't feel like creating your own decoders/encoders, take a look at my project BitIO, because I've already written them, especially ReadRICE/WriteRICE, and ReadExpGolomb/WriteExpGolomb functions, BitIO on Github

Related

Detect contiguous numbers - MATLAB

I coded a program that create some bunch of binary numbers like this:
out = [0,1,1,0,1,1,1,0,0,0,1,0];
I want check existence of nine 1 digit together in above out, for example when we have this in our output:
out_2 = [0,0,0,0,1,1,1,1,1,1,1,1,1];
or
out_3 = [0,0,0,1,1,1,1,0,0,1,0,1,1,1,1,1,1,1,1,1,0,0,0,1,1,0];
condition variable should be set to 1. We don't know exact position of start of ones in outvariable. It is random. I only want find existence of duplicate ones values in above variable (one occurrence or more).
PS.
We are searching for a general answer to find other duplicate numbers (not only 1 here and not only for binary data. this is just an example)
You can use convolution to solve such r-contiguous detection cases.
Case #1 : To find contiguous 1s in a binary array -
check = any(conv(double(input_arr),ones(r,1))>=r)
Sample run -
input_arr =
0 0 0 0 1 1 1 1 1 1 1 1 1
r =
9
check =
1
Case #2 : For detecting any number as contiguous, you could modify it a bit, like so -
check = any(conv(double(diff(input_arr)==0),ones(1,r-1))>=r-1)
Sample run -
input_arr =
3 5 2 4 4 4 5 5 2 2
r =
3
check =
1
To save Stackoverflow from further duplicates, also feel free to look into related problems -
Fast r-contiguous matching (based on location similarities).
r-contiguous matching, MATLAB.

ASN1 parsing with swift

I think I get the basic idea behind ASN.1 parsing. Walk the bytes, interpret them and do something useful with them. Alas I am stuck on the implementation.
Apple has no sample code (that I could find), probably for security reasons. OpenSSL is not very well documented, so I can only guess at what the functions actually do. The only sample code in swift I did find doesn't handle case 17 (the in-app purchases), which is the one thing I am interested in.
I tried figuring out where in the data stream the pointer is located, but I always get the same nonsensical result of 49.
let receiptContents = NSData(contentsOfURL: receiptLocation)!
let receiptBIO = BIO_new(BIO_s_mem())
BIO_write(receiptBIO, receiptContents.bytes, Int32(receiptContents.length))
contents = d2i_PKCS7_bio(receiptBIO, nil)
//parsing
var currentIndex = UnsafePointer<UInt8>()
var endIndex = UnsafePointer<UInt8>()
let octets = pkcs7_d_data(pkcs7_d_sign(self.contents).memory.contents)
var ptr = UnsafePointer<UInt8>(octets.memory.data)
let end = ptr.advancedBy(Int(octets.memory.length))
println(ptr.memory) //always 49 ???
println(end.memory) //always 0 ???
println(octets.memory.length)
I tried parsing the NSData myself, but well, what is the type of the binaire data?
receiptContents = NSData(contentsOfURL: receiptLocation)!
//get bytes
let count = receiptContents.length / sizeof(UInt8)
var bytes = [UInt8](count: count, repeatedValue: 0)
receiptContents.getBytes(&bytes, length:count * sizeof(UInt8))
//parsing
for index in 0...5
{
let value = Int(bytes[index])
println(value)
}
I get this output:
48
130
21
57
6
9
but if understand the ASN.1 format correctly, it's supposed to start with a value of 17 (set), then 3 bytes for the length (Int24?), then a value of 16 (first sequence), sequence length (1 byte), sequence type (1 byte), sequence payload, (repeat).
Other types like Int32, Int16 makes even less sense to me.
Not sure how to proceed here. Any suggestions?
From the outset, I suppose I should disclaim this by saying I'm not very familiar with many of the underlying technologies (Swift, OpenSSL, the Biometrics standards that I think you're working with).
That said, you're probably running afoul of BER's tagging rules. The Wiki article on X.690 has some introductory comments about how BER tags are constructed, but really you'll want to consult Annex A of X.690 for an example encoding and X.680 ยง8 for information about tagging.
A SET type can appear in several different forms; but in your case 49 = 0x31 = 0b00110001 = UNIVERSAL 17 (SET, constructed). Other forms may occur, but this is the only one that's explicitly identified as a SET from the tag itself: the specification may use a different tag for a SET type.

How to do bitwise operation decently?

I'm doing analysis on binary data. Suppose I have two uint8 data values:
a = uint8(0xAB);
b = uint8(0xCD);
I want to take the lower two bits from a, and whole content from b, to make a 10 bit value. In C-style, it should be like:
(a[2:1] << 8) | b
I tried bitget:
bitget(a,2:-1:1)
But this just gave me separate [1, 1] logical type values, which is not a scalar, and cannot be used in the bitshift operation later.
My current solution is:
Make a|b (a or b):
temp1 = bitor(bitshift(uint16(a), 8), uint16(b));
Left shift six bits to get rid of the higher six bits from a:
temp2 = bitshift(temp1, 6);
Right shift six bits to get rid of lower zeros from the previous result:
temp3 = bitshift(temp2, -6);
Putting all these on one line:
result = bitshift(bitshift(bitor(bitshift(uint16(a), 8), uint16(b)), 6), -6);
This is doesn't seem efficient, right? I only want to get (a[2:1] << 8) | b, and it takes a long expression to get the value.
Please let me know if there's well-known solution for this problem.
Since you are using Octave, you can make use of bitpack and bitunpack:
octave> a = bitunpack (uint8 (0xAB))
a =
1 1 0 1 0 1 0 1
octave> B = bitunpack (uint8 (0xCD))
B =
1 0 1 1 0 0 1 1
Once you have them in this form, it's dead easy to do what you want:
octave> [B A(1:2)]
ans =
1 0 1 1 0 0 1 1 1 1
Then simply pad with zeros accordingly and pack it back into an integer:
octave> postpad ([B A(1:2)], 16, false)
ans =
1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0
octave> bitpack (ans, "uint16")
ans = 973
That or is equivalent to an addition when dealing with integers
result = bitshift(bi2de(bitget(a,1:2)),8) + b;
e.g
a = 01010111
b = 10010010
result = 00000011 100010010
= a[2]*2^9 + a[1]*2^8 + b
an alternative method could be
result = mod(a,2^x)*2^y + b;
where the x is the number of bits you want to extract from a and y is the number of bits of a and b, in your case:
result = mod(a,4)*256 + b;
an extra alternative solution close to the C solution:
result = bitor(bitshift(bitand(a,3), 8), b);
I think it is important to explain exactly what "(a[2:1] << 8) | b" is doing.
In assembly, referencing individual bits is a single operation. Assume all operations take the exact same time and "efficient" a[2:1] starts looking extremely inefficient.
The convenience statement actually does (a & 0x03).
If your compiler actually converts a uint8 to a uint16 based on how much it was shifted, this is not a 'free' operation, per se. Effectively, what your compiler will do is first clear the "memory" to the size of uint16 and then copy "a" into the location. This requires an extra step (clearing the "memory" (register)) that wouldn't normally be needed.
This means your statement actually is (uint16(a & 0x03) << 8) | uint16(b)
Now yes, because you're doing a power of two shift, you could just move a into AH, move b into AL, and AH by 0x03 and move it all out but that's a compiler optimization and not what your C code said to do.
The point is that directly translating that statement into matlab yields
bitor(bitshift(uint16(bitand(a,3)),8),uint16(b))
But, it should be noted that while it is not as TERSE as (a[2:1] << 8) | b, the number of "high level operations" is the same.
Note that all scripting languages are going to be very slow upon initiating each instruction, but will complete said instruction rapidly. The terse nature of Python isn't because "terse is better" but to create simple structures that the language can recognize so it can easily go into vectorized operations mode and start executing code very quickly.
The point here is that you have an "overhead" cost for calling bitand; but when operating on an array it will use SSE and that "overhead" is only paid once. The JIT (just in time) compiler, which optimizes script languages by reducing overhead calls and creating temporary machine code for currently executing sections of code MAY be able to recognize that the type checks for a chain of bitwise operations need only occur on the initial inputs, hence further reducing runtime.
Very high level languages are quite different (and frustrating) from high level languages such as C. You are giving up a large amount of control over code execution for ease of code production; whether matlab actually has implemented uint8 or if it is actually using a double and truncating it, you do not know. A bitwise operation on a native uint8 is extremely fast, but to convert from float to uint8, perform bitwise operation, and convert back is slow. (Historically, Matlab used doubles for everything and only rounded according to what 'type' you specified)
Even now, octave 4.0.3 has a compiled bitshift function that, for bitshift(ones('uint32'),-32) results in it wrapping back to 1. BRILLIANT! VHLL place you at the mercy of the language, it isn't about how terse or how verbose you write the code, it's how the blasted language decides to interpret it and execute machine level code. So instead of shifting, uint32(floor(ones / (2^32))) is actually FASTER and more accurate.

Transforming ciphertext from digital format to alphabetic format

Consider a message "STOP" which we are to encrypt using the RSA algorithm. The values given are p = 43, q = 59, n = pq, e = 13. At first I have transformed "STOP" into blocks of 4-bit code which are 1819 (S = 18 and T = 19) and 1415 (O = 14, P = 15) respectively (alphabets are numbered from 00 to 25).
Finally after calculation I have got 20812182 as the encrypted message (after combining 2081 and 2182). Is there any way to transform this digital code of the ciphertext to the alphabet form?
If we start by considering 2 bits, then 20 = U, 81 = ?, 21 = V, 82 = ?,what will be the alphabets for 81 and 82? I mean to ask,what will be the ciphertext for the plaintext "STOP" in the above case?
RSA works with numbers not binary data nor letters. You can of course convert one to another. E.g. this is what you did when you wrote 20812182. The number with that value can have an endless number of other representations.
Now creating an alphabetical representation that has a minimum size is pretty tricky to do. Basically you can divide by powers of 26. This is however not easy to implement. Instead you can take a subset of your alphabet and use that to represent your number.
To do this use your original number representation and replace 0 with A, 1 with B ... and 9 with J. This would result in CAIBCBIC for your ciphertext.
Note that plaintext and ciphertext are used as names for the input and output of cryptographic ciphers. Both names seem to indicate some kind of human readable text - and maybe they once did - but in cryptography they can be thought of as any kind of data.

Choosing values for constants

One thing I've never really understood is why in many libraries, constants are defined like this:
public static final int DM_FILL_BACKGROUND = 0x2;
public static final int DM_FILL_PREVIOUS = 0x3;
public static final int TRANSPARENCY_MASK = 1 << 1;
public static final int TRANSPARENCY_PIXEL = 1 << 2;
What's up with the 0x and << stuff? Why aren't people just using ordinary integer values?
The bit shifting of 1 is usually for situations where you have non-exclusive values that you want to store.
For example, say you want to be able to draw lines on any side of a box. You define:
LEFT_SIDE = 1 << 0 # binary 0001 (1)
RIGHT_SIDE = 1 << 1 # binary 0010 (2)
TOP_SIDE = 1 << 2 # binary 0100 (4)
BOTTOM_SIDE = 1 << 3 # binary 1000 (8)
----
0111 (7) = LEFT_SIDE | RIGHT_SIDE | TOP_SIDE
Then you can combine them for multiple sides:
DrawBox (LEFT_SIDE | RIGHT_SIDE | TOP_SIDE) # Don't draw line on bottom.
The fact that they're using totally different bits means that they're independent of each other. By ORing them you get 1 | 2 | 4 which is equal to 7 and you can detect each individual bit with other boolean operations (see here and here for an explanation of these).
If they were defined as 1, 2, 3 and 4 then you'd probably either have to make one call for each side or you'd have to pass four different parameters, one per side. Otherwise you couldn't tell the difference between LEFT and RIGHT (1 + 2 = 3) and TOP (3), since both of them would be the same value (with a simple addition operation).
The 0x stuff is just hexadecimal numbers which are easier to see as binary bitmasks (each hexadecimal digit corresponds exactly with four binary digits. You'll tend to see patterns like 0x01, 0x02, 0x04, 0x08, 0x10, 0x20 and so on, since they're the equivalent of a single 1 bit moving towards the most significant bit position - those values are equivalent to binary 00000001, 00000010, 00000100, 00001000, 00010000, 00100000 and so on.
Aside: Once you get used to hex, you rarely have to worry about the 1 << n stuff. You can instantly recognise 0x4000 as binary 0100 0000 0000 0000. That's less obvious if you see the value 16384 in the code although some of us even recognise that :-)
Regarding << stuff: this in my preferred way.
When I need to define the constant with 1 in the bit 2 position, and 0 in all other bits, I can define it as 4, 0x4 or 1<<2. 1<<2 is more readable, to my opinion, and explains exactly the purpose of this constant.
BTW, all these ways give the same performance, since calculations are done at compile time.